You are on page 1of 10

Review Erik Cambria

School of Computer Engineering,


Article Nanyang Technological University

Bebo White
SLAC National Accelerator Laboratory,
Stanford University

Jumping NLP Curves: A Review of Natural


Language Processing Research

N atural language processing


(NLP) is a theory-motivated
range of computational tech-
niques for the automatic analysis and
representation of human language.
manipulation of recursive, constitu-
ent structures;
acquisition and access of lexical,
semantic, and episodic memories;
control of multiple learning/process-
NLP research has evolved from the era ing modules and routing of informa-
of punch cards and batch processing (in tion among such modules;
which the analysis of a sentence could grounding of basic-level language
take up to 7 minutes) to the era of constructs (e.g., objects and actions)
Google and the likes of it (in which BRAND X PICTURES in perceptual/motor experiences;
millions of webpages can be processed representation of abstract concepts.
in less than a second). This review own contents, ideas, and opinions, in a All such capabilities are required to
paper draws on recent developments in time- and cost-efficient way, with virtu- shift from mere NLP to what is usually
NLP research to look at the past, pres- ally millions of other people connected referred to as natural language under-
ent, and future of NLP technology in a to the World Wide Web. This huge standing (Allen, 1987). Today, most of
new light. Borrowing the paradigm of amount of information, however, is the existing approaches are still based on
jumping curves from the field of mainly unstructured (because it is spe- the syntactic representation of text, a
business management and marketing cifically produced for human consump- method that relies mainly on word co-
prediction, this survey article reinter- tion) and hence not directly machine- occurrence frequencies. Such algorithms
prets the evolution of NLP research as processable. The automatic analysis of are limited by the fact that they can pro-
the intersection of three overlapping text involves a deep understanding of cess only the information that they can
curves-namely Syntactics, Semantics, natural language by machines, a reality see. As human text processors, we do
and Pragmatics Curves- which will from which we are still very far off. not have such limitations as every word
eventually lead NLP research to evolve Hither to, online infor mation we see activates a cascade of semantically
into natural language understanding. retrieval, aggregation, and processing related concepts, relevant episodes, and
have mainly been based on algorithms sensory exper iences, all of which
I. Introduction relying on the textual representation of enable the completion of complex
Between the birth of the Internet and web pages. Such algorithms are very NLP taskssuch as word-sense disam-
2003, year of birth of social networks good at retrieving texts, splitting them biguation, textual entailment, and
such as MySpace, Delicious, LinkedIn, into parts, checking the spelling and semantic role labelingin a quick and
and Facebook, there were just a few counting the number of words. When effortless way.
dozen exabytes of information on the it comes to interpreting sentences and Computational models attempt to
Web. Today, that same amount of infor- extracting meaningful information, bridge such a cognitive gap by emulat-
mation is created weekly. The advent of however, their capabilities are known to ing the way the human brain processes
the Social Web has provided people be very limited. Natural language pro- natural language, e.g., by leveraging on
with new content-sharing services that cessing (NLP), in fact, requires high- semantic features that are not explicitly
allow them to create and share their level symbolic capabilities (Dyer, 1994), expressed in text. Computational mod-
including: els are useful both for scientific pur-
Digital Object Identifier 10.1109/MCI.2014.2307227 creation and propagation of dynamic poses (such as exploring the nature of
Date of publication:11 April 2014 bindings; linguistic communication), as well as for

48 IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | MAY 2014 1556-603X/14/$31.002014IEEE


practical purposes (such as enabling approaches; Section 6 introduces pio- tion (Barwise, 1977). FOL supports
effective human-machine communica- neering works on narrative understand- syntactic, semantic and, to a certain
tion). Traditional research disciplines do ing; Section 7 proposes further insights degree, pragmatic expressions. Syntax
not have the tools to completely address on the evolution of current NLP tech- specifies the way groups of symbols are
the problem of how language compre- nologies and suggests near future to be arranged, so that the group of sym-
hension and production work. Even if research directions; finally, Section 8 bols is considered properly formed.
you combine all the approaches, a com- concludes the paper and outlines future Semantics specifies what well-formed
prehensive theory would be too com- areas of NLP research. expressions are supposed to mean. Prag-
plex to be studied using traditional matics specifies how contextual informa-
methods. However, we may be able to 2. Background tion can be leveraged to provide better
realize such complex theories as com- Since its inception in 1950s, NLP correlations between different semantics,
puter programs and then test them by research has been focusing on tasks such which is essential for tasks such as word
observing how well they perform. By as machine translation, information sense disambiguation. Logic, however, is
seeing where they fail, we can incre- retrieval, text summarization, question known to have the problem of monoto-
mentally improve them. Computational answering, information extraction, topic nicity. The set of entailed sentences will
models may provide very specific pre- modeling, and more recently, opinion only increase as information is added to
dictions about human behaviors that mining. Most NLP research carried out the knowledge base, but this runs the
can then be explored by the psycholin- in the early days focused on syntax, risk of violating a common property of
guist. By continuing this process, we partly because syntactic processing was human reasoningthe freedom and
may eventually acquire a deeper under- manifestly necessary, and partly through flexibility to change ones mind. Solu-
standing of how human language pro- implicit or explicit endorsement of the tions such as default and linear logic
cessing occurs. To realize such a dream idea of syntax-driven processing. serve to address parts of these issues.
will take the combined efforts of for- Although the semantic problems and Default logic is proposed by Raymond
ward-thinking psycholinguists, neuro- needs of NLP were clear from the very Reiter to formalize default assumptions,
scientists, anthropologists, philosophers, beginning, the strategy adopted by the e.g., all birds fly (Reiter, 1980). How-
and computer scientists. research community was to tackle syntax ever, issues arise when default logic for-
Unlike previous surveys focusing on first, for the more direct applicability of malizes facts that are true in the majority
specific aspects or applications of NLP machine learning techniques. However, of cases but are false with regards to
research (e.g., evaluation criteria (Jones there were some researchers who con- exceptions to these general rules, e.g.,
& Galliers, 1995), knowledge-based sys- centrated on semantics because they saw penguins do not fly.
tems (Mahesh, Nirenburg, & Tucker, it as the really challenging problem or Another popular model for the
1997), text retrieval (Jackson & Moulin- assumed that semantically-driven pro- description of natural language is pro-
ier, 1997), and connectionist models cessing be a better approach. Thus, Mas- duction rule (Chomsky, 1956). A pro-
(Christiansen & Chater, 1999)), this termans and Ceccatos groups, for exam- duction rule system keeps a working
review paper focuses on the evolution of ple, exploited semantic pattern matching memory of on-going memory assertions.
NLP research according to three differ- using semantic categories and semantic This working memory is volatile and in
ent paradigms, namely: the bag-of- case frames, and in Ceccatos work (Cec- turn keeps a set of production rules. A
words, bag-of-concepts, and bag-of-nar- cato, 1967) particularly, world knowledge production rule comprises of an ante-
ratives models. Borrowing the concept was used to extend linguistic semantics, cedent set of conditions and a conse-
of jumping curves from the field of along with semantic networks as a quent set of actions (i.e., IF <condi-
business management, this survey article device for knowledge representation. tions> THEN <actions>). The basic
explains how and why NLP research has Later works recognized the need for operation for a production rule system
been gradually shifting from lexical external knowledge in interpreting and involves a cycle of three steps (recog-
semantics to compositional semantics responding to language input (Minsky, nize, resolve conflict, and act) that
and offers insights on next-generation 1968) and explicitly emphasized seman- repeats until no more rules are applicable
narrative-based NLP technology. tics in the form of general-purpose to the working memory. The step recog-
The rest of the paper is organized as semantics with case structures for repre- nize identifies the rules whose anteced-
follows: Section 2 presents the historical sentation and semantically-driven pro- ent conditions are satisfied by the current
background and the different schools of cessing (Schank, 1975). working memory.The set of rules identi-
thought of NLP research; Section 3 dis- One of the most popular representa- fied is also called the conflict set. The
cusses past, present, and future evolution tion strategies since then has been first step resolve conflict looks into the con-
of NLP technolog ies; Section 4 order logic (FOL), a deductive system flict set and selects a set of suitable rules
describes traditional syntax-centered that consists of axioms and rules of infer- to execute. The step act simply executes
NLP methodologies; Section 5 illus- ences and can be used to formalize rela- the actions and updates the working
trates emerging semantics-based NLP tionally-rich predicates and quantifica- memory. Production rules are modular.

MAY 2014 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE 49


Each rule is independent from the oth- allow for an easy representation of tem- the rule of inheritance for copying
ers, allowing rules to be added and poral-dependent knowledge. properties defined for a super-type to all
deleted easily. Production rule systems Networks are yet another well- of its sub-types. The information in defi-
have a simple control structure and the known way to do NLP. For example, nitional networks is often assumed to be
rules are easily understood by humans. Bayesian networks (Pearl, 1985) (also true. Yet another kind of semantic net-
This is because rules are usually derived known as belief networks) provide a works is the assertional network, which
from the observation of expert behavior means of expressing joint probability is meant to assert propositions and the
or expert knowledge, thus the terminol- distributions over many interrelated information it contains is assumed to be
ogy used in encoding the rules tends to hypotheses. All variables are represented contingently true. Contingent truth is
resonate with human understanding. using directed acyclic graph (DAG). Arcs not reached with the application of
However, there are issues with scalability are causal connections between two default logic; instead, it is based more on
when production rule systems become variables where the truth of the former Mans application of common-sense.
larger; a significant amount of mainte- directly affects the truth of the latter. A The proposition also has sufficient rea-
nance is required to maintain a system Bayesian network is able to represent son in which the reason entails the
with thousands of rules. subjective degrees of confidence. The proposition, e.g., the stone is warm
Another instance of a prominent representation explicitly explores the with the sufficient reasons being the
NLP model is the ontology Web lan- role of prior knowledge and combines sun is shining on the stone and what-
guage (OWL) (McGuinness & Van pieces of evidence of the likelihood of ever the sun shines on is warm.
Harmelen, 2004), an XML-based vocab- events. In order to compute the joint The idea of semantic networks arose
ulary that extends the resource descrip- distribution of the belief network, there in the early 1960s from Simmons (Sim-
tion framework (RDF) to provide a is a need to know Pr(P|parents(P)) for mons, 1963) and Quillian (Quillian,
more comprehensive set for ontology each variable P. It is difficult to deter- 1963) and was further developed in the
representation, such as the definition of mine the probability of each variable P late 1980s by Marvin Minsky within his
classes, relationships between classes, in the belief network. Hence, it is also Society of Mind theory (Minsky, 1986),
properties of classes, and constraints on difficult to enhance and maintain the according to which the magic of
relationships between classes and their statistical table for large-scale informa- human intelligence stems from our vast
properties. RDF supports the subject- tion processing problems. Bayesian net- diversityand not from any single, per-
predicate-object model that makes works also have limited expressiveness, fect principle. Minsky theorized that the
assertions about a resource. RDF-based which is only equivalent to the expres- mind is made of many little parts that
reasoning engines have been developed siveness of proposition logic. For this he termed agents, each mindless by
to check for semantic consistency which reason, semantic networks are more itself but able to lead to true intelligence
then helps to improve ontology classifi- often used in NLP research. when working together. These groups
cation. In general, OWL requires the A semantic network (Sowa, 1987) is of agents, or agencies, are responsible
strict definition of static structures, and a graphical notation for representing for performing some type of function,
therefore is not suitable for representing knowledge in patterns of interconnected such as remembering, comparing, gen-
knowledge that contains subjective nodes and arcs. Definitional networks eralizing, exemplifying, analogizing, sim-
degrees of confidence. Instead, it is more focus on IsA relationships between a plifying, predicting, etc. Minskys theory
suited for representing declarative concept and a newly defined sub-type. of human cognition, in particular, was
knowledge. Furthermore, yet another The result of such a structure is called a welcomed with great enthusiasm by the
problem of OWL is that it does not generalization, which in turn supports artificial intelligence (AI) community
and gave birth to many attempts to
build common-sense knowledge bases
TABLE 1 Most popular schools of thought in knowledge representation and NLP research. for NLP tasks. The most representative
APPROACH CHARACTERISTIC FEATURES REFERENCE projects are: (a) Cyc (Lenat & Guha,
PRODUCTION RULE CYCLES OF `RECOGNIZE, `RESOLVE (CHOMSKY, 1956) 1989), Doug Lenats logic-based reposi-
CONFLICT, `ACT STEPS tory of common-sense knowledge; (b)
SEMANTIC PATTERN SEMANTIC CATEGORIES AND SEMANTIC (CECCATO, 1967) WordNet (Fellbaum, 1998), Christiane
MATCHING CASE FRAMES Fellbaums universal database of word
FIRST ORDER LOGIC AXIOMS AND RULES OF INFERENCES (BARWISE, 1977) senses; (c) Thought-Treasure (Mueller,
(FOL)
1998), Erik Muellers story understand-
BAYESIAN NETWORKS VARIABLES REPRESENTED BY A PROBABILIS- (PEARL, 1985)
TIC DIRECTED ACYCLIC GRAPH ing system; and (d) the Open Mind
SEMANTIC NETWORKS PATTERNS OF INTERCONNECTED NODES (SOWA, 1987) Common Sense project (Singh, 2002), a
AND ARCS second-generation common-sense data-
ONTOLOGY WEB HIERARCHICAL CLASSES AND RELATION- (MCGUINNESS & VAN base. The last project stands out because
LANGUAGE (OWL) SHIPS BETWEEN THEM HARMELEN, 2004) knowledge is represented in natural

50 IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | MAY 2014


language (rather than being based upon phenomena such as web-trolling and edge (which people continue to accrue
a formal logical structure), and informa- opinion spam, are causing standard NLP in their everyday life) in a re-usable
tion is not hand-crafted by expert engi- algorithms to be increasing less efficient. knowledge base for machines. Common
neers but spontaneously inserted by In order to properly extract and manip- knowledge includes general knowledge
online volunteers. Today, the common- ulate text meanings, a NLP system must about the world, e.g., a chair is a type of
sense knowledge collected by the Open have access to a significant amount of furniture, while common-sense knowl-
Mind Common Sense project is being knowledge about the world and the edge comprises obvious or widely
exploited for many different NLP tasks domain of discourse. accepted things that people normally
such as textual affect sensing (H. Liu, To this end, NLP systems will know about the world but which are
Lieberman, & Selker, 2003), casual con- gradually stop relying too much on usually left unstated in discourse, e.g.,
versation understanding (Eagle, Singh, & word-based techniques while starting that things fall downwards (and not
Pentland, 2003), opinion mining (Cam- to exploit semantics more consistently upwards) and people smile when they are
bria & Hussain, 2012), story telling and, hence, make a leap from the happy. The difference between common
(Hayden et al., 2013), and more. Syntactics Curve to the Semantics and common-sense knowledge can be
Curve (Figure 1). NLP research has expressed as the difference between
3. Overlapping NLP Curves been interspersed with word-level knowing the name of an object and
With the dawn of the Internet Age, approaches because, at first glance, the understanding the same objects purpose.
civilization has undergone profound, most basic unit of linguistic structure For example, you can know the name of
rapid-fire changes that we are experi- appears to be the word. Single-word all the different kinds or brands of pipe,
encing more than ever today. Even expressions, however, are just a subset but not its purpose nor the method of
technologies that are adapting, growing, of concepts, multi-word expressions usage. In other words, a pipe is not a
and innovating have the gnawing sense that carry specific semantics and sentics pipe unless it can be used (Magritte,
that obsolescence is right around the (Cambria & Hussain, 2012), that is, the 1929) (Figure 2).
corner. NLP research, in particular, has denotative and connotative informa- It is through the combined use of
not evolved at the same pace as other tion commonly associated with real- common and common-sense knowl-
technologies in the past 15 years. world objects, actions, events, and edge that we can have a grip on both
While NLP research has made great people. Sentics, in particular, specifies high- and low-level concepts as well as
strides in producing artificially intelli- the affective information associated nuances in natural language understand-
gent behaviors, e.g., Google, IBMs Wat- with such real-world entities, which is ing and therefore effectively communi-
son, and Apples Siri, none of such NLP key for common-sense reasoning and cate with other people without having
frameworks actually understand what decision-making. to continuously ask for definitions and
they are doingmaking them no differ- Semantics and sentics include com- explanations. Common-sense, in partic-
ent from a parrot that learns to repeat mon-sense knowledge (which humans ular, is key in properly deconstructing
words without any clear understanding normally acquire during the formative natural language text into sentiments
of what it is saying. Today, even the most years of their lives) and common knowl- according to different contextsfor
popular NLP technologies view text
analysis as a word or pattern matching
task. Trying to ascertain the meaning of NLP System Performance
Best Path
a piece of text by processing it at word-
level, however, is no different from
attempting to understand a picture by
analyzing it at pixel-level. Pragmatics Curve
In a Web where user-generated con- (Bag-of-Narratives)
tent (UGC) is drowning in its own out-
put, NLP researchers are faced with the
same challenge: the need to jump the Semantics Curve
(Bag-of-Concepts)
curve (Imparato & Harari, 1996) to
make significant, discontinuous leaps in
their thinking, whether it is about
information retrieval, aggregation, or Syntactics Curve
processing. Relying on arbitrary key- (Bag-of-Words)
words, punctuation, and word co-
occurrence frequencies has worked 1950 2000 2050 2100 Time
fairly well so far, but the explosion of
UGCs and the outbreak of deceptive FIGURE 1 Envisioned evolution of NLP research through three different eras or curves.

MAY 2014 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE 51


example, in appraising the concept small ing argued the importance and inevita- belong to, might never be retrieved by a
room as negative for a hotel review and bility of a shift away from syntax for keyword-based search engine.
small queue as positive for a post office, years, the vast major ity of NLP
or the concept go read the book as researchers nowadays are still trying to 4.2. Lexical Affinity
positive for a book review but negative keep their balance on the Syntactics Lexical Affinity is slightly more sophisti-
for a movie review. Curve. Syntax-centered NLP can be cated than keyword spotting as, rather
Semantics, however, is just one layer broadly grouped into three main cate- than simply detecting obvious words, it
up in the scale that separates NLP from gories: keyword spotting, lexical affinity, assigns to arbitrary words a probabilistic
natural language understanding. In and statistical methods. affinity for a particular category (Bush,
order to achieve the ability to accu- 1999; Bybee & Scheibman, 1999; Krug,
rately and sensibly process information, 4.1. Keyword Spotting 1998; Church & Hanks, 1989; Jurafsky
computational models will also need to Keyword Spotting is the most nave et al., 2000). For example, accident
be able to project semantics and sentics approach and probably also the most might be assigned a 75% probability of
in time, compare them in a parallel and popular because of its accessibility and indicating a negative event, as in car
dynamic way, according to different economy. Text is classified into catego- accident or hurt in an accident. These
contexts and with respect to different ries based on the presence of fairly probabilities are usually gleaned from
actors and their intentions (Howard & unambiguous words. Popular projects linguistic corpora (Kucera & Francis,
Cambria, 2013). This will mean jump- include: (a) Ortonys Affective Lexicon 1969; Godfrey, Holliman, & McDaniel,
ing from the Semantics Curve to the (Ortony, Clore, & Collins, 1988), which 1992; Stevenson, Mikels, & James, 2007).
Pragmatics Curve, which will enable groups words into affective categories; Although this approach often outper-
NLP to be more adaptive and, hence, (b) Penn Treebank (Marcus, Santorini, & forms pure keyword spotting, there are
open-domain, context-aware, and Marcinkiewicz, 1994), a corpus consist- two main problems with it. First, lexical
intent-driven. Intent, in particular, will ing of over 4.5 million words of Ameri- affinity operating solely on the word-
be key for tasks such as sentiment anal- can English annotated for part-of- level can easily be tricked by sentences
ysisa concept that generally has a speech (POS) infor mation; (c) such as I avoided an accident (nega-
negative connotation, e.g., small seat, PageRank (Page, Brin, Motwani, & tion) and I met my girlfriend by acci-
might turn out to be positive, e.g., if the Winograd, 1999), the famous ranking dent (connotation of unplanned but
intent is for an infant to be safely seated algorithm of Google; (d) LexRank lovely surprise). Second, lexical affinity
in it. (Gnes & Radev, 2004), a stochastic probabilities are often biased toward text
While the paradigm of the Syntac- graph-based method for computing rel- of a particular genre, dictated by the
tics Curve is the bag-of-words model ative importance of textual units for source of the linguistic corpora. This
(Zellig, 1954) and the Semantics NLP; finally, (e) TextRank (Mihalcea & makes it difficult to develop a re-usable,
Curve is characterized by a bag-of- Tarau, 2004), a graph-based ranking domain-independent model.
concepts model (Cambria & Hussain, model for text processing, based on two
2012), the paradigm of the Pragmatics unsupervised methods for keyword and 4.3. Statistical NLP
Curve will be the bag-of-narratives sentence extraction. The major weakness Statistical NLP has been the mainstream
model. In this last model, each piece of keyword spotting lies in its reliance NLP research direction since late 1990s.
of text will be represented by mini- upon the presence of obvious words It relies on language models (Manning
stories or interconnected episodes, which are only surface features of the & Schtze, 1999; Hofmann, 1999;
leading to a more detailed level of text prose. A text document about dogs Nigam, McCallum, Thrun, & Mitchell,
comprehension and sensible computa- where the word dog is never men- 2000) based on popular machine-learn-
tion. While the bag-of-concepts model tioned, e.g., because dogs are addressed ing algorithms such as maximum-likeli-
helps to overcome problems such as according to the specific breeds they hood (Berger, Della Pietra, & Della
word-sense disambiguation and Pietra, 1996), expectation maximization
semantic role labeling, the bag-of-nar- (Nigam et al., 2000), conditional ran-
ratives model will enable tackling dom fields (Lafferty, McCallum, &
NLP issues such as co-reference reso- Pereira, 2001), and support vector
lution and textual entailment. machines (Joachims, 2002). By feeding a
large training corpus of annotated texts
4. Poising on the Syntactics Curve to a machine-learning algorithm, it is
Today, syntax-centered NLP is still the possible for the system to not only learn
most popular way to manage tasks such the valence of keywords (as in the key-
as information retrieval and extraction, word spotting approach), but also to take
auto-categorization, topic modeling, FIGURE 2 A pipe is not a pipe, unless into account the valence of other arbi-
etc. Despite semantics enthusiasts hav- we know how to use it. trary keywords (like lexical affinity),

52 IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | MAY 2014


punctuation, and word co-occurrence able savings in terms of expert man- through syntactic patterns for auto-
frequencies. However, statistical methods power, and straightforward portability to matic hypernym discovery (Hearst,
are generally semantically weak, mean- different domains (Sebastiani, 2002). 1992) able to infer triples such as
ing that, with the exception of obvious Endogenous NLP includes methods <Pablo Picasso-IsA-ar tist> from
keywords, other lexical or co-occur- based either on lexical semantics, which stretches of text like ...artists such as
rence elements in a statistical model focuses on the meanings of individual Pablo Picasso... or ...Pablo Picasso
have little predictive value individually. words, or compositional semantics, and other artists....
As a result, statistical text classifiers only which looks at the meanings of sen- In general, attempts to build taxo-
work with acceptable accuracy when tences and longer utterances. The vast nomic resources are countless and
given a sufficiently large text input. So, m a j o r i t y o f e n d og e n o u s N L P include both resources crafted by
while these methods may be able to approaches is based on lexical semantics human experts or community efforts,
classify text on the page- or paragraph- and includes well-known machine- such as WordNet and Freebase (Bol-
level, they do not work well on smaller learning techniques. Some examples of lacker, Evans, Paritosh, Sturge, & Taylor,
text units such as sentences or clauses. this are: (a) latent semantic analysis 2008), and automatically built knowl-
(Hofmann, 2001), where documents are edge bases. Examples of such knowl-
5. Surfing the Semantics Curve represented as vectors in a term space; edge bases include: (a) WikiTaxonomy
Semantics-based NLP focuses on the (b) latent Dirichlet allocation (Porteous (Ponzetto & Strube, 2007), a taxonomy
intrinsic meaning associated with natu- et al., 2008), which involves attributing extracted from Wikipedias category
ral language text. Rather than simply document terms to topics; (c) MapRe- links; (b) YAGO (Suchanek, Kasneci, &
processing documents at syntax-level, duce (C. Liu, Qi, Wang, & Yu, 2012), a Weikum, 2007), a semantic knowledge
semantics-based approaches rely on framework that has proved to be very base derived from WordNet, Wikipedia,
implicit denotative features associated efficient for data-intensive tasks, e.g., and GeoNames; (c) NELL (Carlson et
with natural language text, hence step- large scale RDFS/OWL reasoning and al., 2010) (Never-Ending Language
ping away from the blind usage of key- (d) genetic algorithms (D. Goldberg, Learning), a semantic machine-learning
words and word co-occurrence count. 1989), probabilistic search procedures system that is acquiring knowledge
Unlike purely syntactical techniques, designed to work on large spaces from the Web every day; finally, (d) Pro-
concept-based approaches are also able involving states that can be represented base (Wu, Li, Wang, & Zhu, 2012), a
to detect semantics that are expressed by strings. research prototype that aims to build a
in a subtle manner, e.g., through the Works leveraging on compositional unified taxonomy of worldly facts from
analysis of concepts that do not explic- semantics, instead, mainly include 1.68 billion webpages in Bing repository.
itly convey relevant information, but approaches based on Hidden Markov Other popular Semantic Web proj-
which are implicitly linked to other Models (Denoyer, Zaragoza, & Gallinari, ects include: (a) SHOE (Heflin & Hen-
concepts that do so. Semantics-based 2001; Frasconi, Soda, & Vullo, 2001), dler, 1999) (Simple HTML Ontology
NLP approaches can be broadly association rule learning (Cohen, 1995; Extensions), a knowledge representa-
grouped into two main categories: Cohen & Singer, 1999), feature ensem- tion language that allows webpages to
techniques that leverage on external bles (Xia, Zong, Hu, & Cambria, 2013; be annotated with semantics; (b)
knowledge, e.g., ontologies (taxonomic Poria, Gelbukh, Hussain, Das, & Ban- Annotea (Kahan, 2002), an open RDF
NLP) or semantic knowledge bases dyopadhyay, 2013) and probabilistic gen- infrastructure for shared Web annota-
(noetic NLP), and methods that exploit erative models (Lau, Xia, & Ye, 2014). tions; (c) SIOC (Breslin, Harth, Bojars,
only intrinsic semantics of documents & Decker, 2005) (Semantically Inter-
(endogenous NLP). 5.2. Taxonomic NLP linked Online Communities), an ontol-
Taxonomic NLP includes initiatives ogy combining terms from vocabular-
5.1. Endogenous NLP that aim to build universal taxonomies ies that already exist with new terms
Endogenous NLP involves the use of or Web ontologies for grasping the sub- needed to describe the relationships
machine-learning techniques to per- sumptive or hierarchical semantics asso- between concepts in the realm of
form semantic analysis of a corpus by ciated with natural language expres- online community sites; (d) SKOS
building structures that approximate sions. Such taxonomies usually consist (Miles & Bechhofer, 2009) (Simple
concepts from a large set of documents. of concepts (e.g., painter), instances (e.g., Knowledge Organization System), an
It does not involve prior semantic Leonardo da Vinci), attributes and area of work developing specifications
understanding of documents; instead, it values (e.g., Leonardos birthday is and standards to support the use of
relies only on the endogenous knowl- April 15, 1452), and relationships (e.g., knowledge organization systems such
edge of these (rather than on external Mona Lisa is painted by Leonardo). as thesauri, classification schemes, sub-
knowledge bases). The advantages of this In particular, subsumptive knowledge ject heading lists and taxonomies; (e)
approach over the knowledge engineer- representations build upon IsA rela- FOAF (Br ickley & Miller, 2010)
ing approach are effectiveness, consider- tionships, which are usually extracted (Friend Of A Friend), a project devoted

MAY 2014 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE 53


to linking people and information edge about objects, actions, events, and parser a sense of what categories of
using the Web; (f ) ISOS (Ding, Jin, people. Noetic NLP, moreover, per- words are used together and thus where
Ren, & Hao, 2013) (Intelligent Self- forms reasoning in an adaptive and to expect different words.
Organizing Scheme), a scheme for the dynamic way, e.g., by generating con- CBSP uses this knowledge to deter-
Internet of Things inspired by the text-dependent results or by discover- mine constructions, their matching lexi-
endocr ine regulating mechanism; ing new semantic patterns that are not cal terms, and how good each match is.
finally, (g) FRED (Gangemi, Presutti, & explicitly encoded in the knowledge Each of CBSPs constructions contrib-
Reforgiato, 2014), a tool that produces base. Examples of noetic NLP include utes its own unique semantics and car-
an event-based RDF/OWL representa- paradigms such as connectionist NLP ries a unique name. In order to choose
tion of natural language text. The main (Christiansen & Chater, 1999), which the best possible construction for each
weakness of taxonomic NLP is in the models mental phenomena as emergent span of text, CBSP uses knowledge
typicality of their knowledge bases. The processes of interconnected networks about the lexical items found in text.
way knowledge is represented in tax- of simple units, e.g., neural networks This knowledge is obtained from look-
onomies and Web ontologies is usually (Collobert et al., 2011); deep learning ing individual lexical terms up in the
strictly defined and does not allow for (Martinez, Bengio, & Yannakakis, 2013); knowledge bases so as to obtain infor-
the combined handling of differing sentic computing (Cambria & Hussain, mation about the basic category mem-
nuanced concepts, as the inference of 2012), an approach to concept-level bership of that word.
semantic features associated with con- sentiment analysis based on an ensem- It then efficiently compares these
cepts is bound by the fixed, flat repre- ble of graph-mining and dimensional- potential memberships with the catego-
sentation. The concept of book, for ity-reduction techniques; and energy- ries specified for each construction in
example, is typically associated to con- based knowledge representation the corpus, finding the best matches so
cepts such as newspaper or magazine, (Olsher, 2013), a novel framework for that CBSP can extract a concept from a
as it contains knowledge, has pages, etc. nuanced common-sense reasoning. sentence. An example would be the
In a different context, however, a book Besides knowledge representation extraction of the concept buy christmas
could be used as paperweight, doorstop, and reasoning, a key aspect of noetic present from the sentence today I
or even as a weapon. Another key NLP is also semantic parsing. Most cur- bought a lot of very nice Christmas
weakness of Semantic Web projects is rent NLP technologies rely on part-of- gifts. Constructions are typically nested
that they are not easily scalable and, speech (POS) tagging, but that is unlike within one another: CBSP is capable of
hence, not widely adopted (Gueret, the way the human mind extracts finding only those construction overlaps
Schlobach, Dentler, Schut, & Eiben, meaning from text. Instead, just as the that are semantically sensible, based on
2012). This increases the amount of human mind does, a construction-based the overall semantics of constructions
time that has to pass before the initial semantic parser (CBSP) (Cambria, Raja- and construction slot categories, thus
customer feedback is even possible, and gopal, Olsher, & Das, 2013) quickly greatly reducing the time taken to pro-
also slows down feedback loop itera- identifies meaningful stretches of text cess large numbers of texts. In the big
tions, ultimately putting Semantic Web without requiring time-consuming data environment, a key benefit of con-
applications at a user-experience and phrase structure analysis. The use of con- struction-based parsing is that only small
agility disadvantage as compared to structions, defined as stored pairings of sections of text are required in order to
their Web 2.0 counterparts, because form and function (A. Goldberg, 2003) extract meaning; word category infor-
their usability inadvertently takes a makes it possible to link distributed lin- mation and the generally small size of
back seat to the number of other com- guistic components to one another, eas- constructions mean that the parser can
plex problems that have to be solved ing extraction of semantics from linguis- still make use of error-filled or conven-
before clients even see the application. tic structures. Constructions are tionally unparseable text.
composed of fixed lexical items and cat-
5.3. Noetic NLP egory-based slots, or spaces that are 6. Foreseeing the Pragmatics Curve
Noetic NLP embraces all the mind- filled in by lexical items during text pro- Narrative understanding and generation
inspired approaches to NLP that cessing. An interesting example from the are central for reasoning, decision-mak-
attempt to compensate for the lack of relevant literature would be the con- ing, and sensemaking. Besides being a
domain adaptivity and implicit seman- struction [<ACTION> <OBJECT> key part of human-to-human commu-
tic feature inference of traditional algo- <DIRECTION> <OBJECT>]. nication, narratives are the means by
rithms, e.g., first principles modeling or Instances of this include the phrases which reality is constructed and plan-
explicit statistical modeling. Noetic sneeze the napkin across the table or ning is conducted. Decoding how nar-
NLP differs from taxonomic NLP in hit the ball over the fence. Construc- ratives are generated and processed by
which it does not focus on encoding tions not only help understand how var- the human brain might eventually lead
subsumption knowledge, but rather ious lexical items work together to cre- us to truly understand and explain
attempts to collect idiosyncratic knowl- ate the whole meaning, but also give the human intelligence and consciousness.

54 IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | MAY 2014


Computational modeling is a pow- resulted in the Genesis System. Work- in the field of NLP. The first reason is
erful and effective way to investigate ing with short story summaries pro- that NLP is a huge field currently tack-
narrative understanding. A lot of the vided in English, together with low- ling dozens of different problems for
cognitive processes that lead humans to l eve l c o m m o n - s e n s e r u l e s a n d which specific evaluation metrics exist,
understand or generate narratives have higher-level reflection patterns that are and it is not possible to reduce the
traditionally been of interest to AI also expressed in English, Genesis has whole field into a specific problem, as it
researchers under the umbrella of been successful in demonstrating sev- was done in early works (Novak, 1992).
knowledge representation, common- eral story understanding capabilities. The second reason may be that power-
sense reasoning, social cognition, learn- One instance of this is its ability to ful techniques such as support vector
ing, and NLP. Once NLP research can determine that both Macbeth and the machines (Drucker, Wu, & Vapnik,
grasp semantics at a level comparable to 2007 Russia-Estonia Cyberwar involve 1999), kernel principal component
human text processing, the jump to the revenge, even though neither the word analysis (Schlkopf et al., 1999), and la-
Pragmatics Curve will be necessary, in revenge nor any of its synonyms are tent Dirichlet allocation (Mukherjee &
the same way as semantic machine mentioned in accounts descr ibing Blei, 2009) have achieved remarkable
learning is now gradually evolving from those texts. results on widely used NLP datasets,
lexical to compositional semantics. which are not yet met by computation-
There are already a few pioneering 7. Discussion al intelligence techniques. All such
works that attempt to understand narra- Word- and concept-level approaches to word-based algorithms, however, are
tives by leveraging on discourse struc- NLP are just a first step towards natural limited by the fact that they can process
ture (Asher & Lascarides, 2003), argu- language understanding. The future of only the information that they can see
ment-suppor t hierarchies (Bex, NLP lies in biologically and linguistical- and, hence, will sooner or later reach
Prakken, & Verheij, 2007), plan graphs ly motivated computational paradigms saturation. Computational intelligence
(Young, 2007), and common-sense rea- that enable narrative understanding and, techniques, instead, can go beyond the
soning (Mueller, 2007). One of the hence, sensemaking. Computational in- syntactic representation of documents
most representative initiatives in this telligence potentially has a large future by emulating the way the human brain
context is Patrick Winstons work on possibility to play an important role in processes natural language (e.g., by le-
computational models of narrative NLP research. Fuzzy logic, for example, veraging on semantic features that are
(Winston, 2011; Richards, Finlayson, & has a direct relation to NLP (Carvalho, not explicitly expressed in text) and,
Winston, 2009), which is based on five Batista, & Coheur, 2012) for tasks such hence, have higher potential to tackle
key hypotheses: as sentiment analysis (Subasic & complementary NLP tasks. An ensem-
The inner language hypothesis: we Huettner, 2001), linguistic summariza- ble of computational intelligence tech-
have an inner symbolic language that tion (Kacprzyk & Zadrozny, 2010), niques, for example, could be exploited
enables event description. knowledge representation (Lai, Wu, Lin, within the same NLP model for on-
The strong story hypothesis: we can & Huang, 2011), and word meaning in- line learning of natural language con-
assemble event descriptions into stories. ference (Kazemzadeh, Lee, & Narayanan, cepts (through neural networks),
The directed perception hypothesis: 2013). Artificial neural networks can aid concept classification and semantic fea-
we can direct the resources of our per- the completion of NLP tasks such as ture generalization (through fuzzy sets),
ceptual faculties to answer questions ambiguity resolution (Chan & Franklin, and concept meaning evolution and
using real and imagined situations. 1998; Costa, Frasconi, Lombardo, & continuous system optimization
The social animal hypothesis: we Soda, 2005), grammatical inference (through evolutionary computation).
have a powerful reason to express the (Lawrence, Giles, & Fong, 2000), word
thought in our inner language in an representation (Luong, Socher, & Man- 8. Conclusion
external communication language. ning, 2013), and emotion recognition In a Web where user-generated content
The exotic engineering hypothesis: (Cambria, Gastaldo, Bisio, & Zunino, has already hit critical mass, the need for
our brains are unlike standard left-to- 2014). Evolutionary computation can be sensible computation and information
right engineered systems. exploited for tasks such as grammatical aggregation is increasing exponentially,
Essentially, Patrick Winston believes evolution (ONeill & Ryan, 2001), as demonstrated by the mad rush in the
that human intelligence stems from our knowledge discover y (Atkinson- industry for big data experts and the
unique abilities for storytelling and Abutridy, Mellish, & Aitken, 2003), text growth of a new Data Science disci-
understanding (Finlayson & Winston, categorization (Araujo, 2004), and rule pline. The democratization of online
2011). Accordingly, his recent work has lear ning (Ghandar, Michalewicz, content creation has led to the increase
focused on developing a computational Schmidt, To, & Zurbruegg, 2009). of Web debris, which is inevitably and
system that is able to analyze narrative Despite its potential, however, the negatively affecting information retrieval
texts to infer non-obvious answers to use of computational intelligence tech- and extraction. To analyze this negative
questions about these texts. This has niques till date has not been so active trend and propose possible solutions, this

MAY 2014 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE 55


review paper focused on the evolution knowledge about the way people think, [20] S. Chan and J. Franklin, Symbolic connectionism
of NLP research according to three dif- and so on. Having a database of millions in natural language disambiguation, IEEE Trans. Neural
Netw., vol. 9, no. 5, pp. 739755, 1998.
ferent paradigms, namely: the bag-of- of common-sense facts, however, is not [21] N. Chomsky, Three models for the description of
words, bag-of-concepts, and bag-of-nar- enough for computational natural lan- language, IRE Trans. Inform. Theory, vol. 2, no. 3, pp.
113124, 1956.
ratives models. Borrowing the concept guage understanding: we will need to [22] M. Christiansen and N. Chater, Connectionist
of jumping curves from the field of teach NLP systems how to handle this natural language processing: The state of the art, Cogn.
Sci., vol. 23, no. 4, pp. 417437, 1999.
business management, this survey article knowledge (IQ), but also interpret emo-
[23] M. Christiansen and S. Kirby, Language evolution:
explained how and why NLP research is tions (EQ) and cultural nuances (CQ). The hardest problem in science? in Language Evolution,
gradually shifting from lexical semantics M. Christiansen and S. Kirby, Eds. Oxford, U.K.: Oxford
Univ. Press, 2003. pp. 115.
to compositional semantics and offered References [24] K. Church and P. Hanks, Word association norms,
insights on next-generation narrative- [1] J. Allen, Natural Language Understanding. Redwood City, mutual information, and lexicography, in Proc. 27th Annu.
CA: Benjamin/Cummings, 1987. Meeting Association Computational Linguistics, 1989, pp. 7683.
based NLP technology. [2] L. Araujo, Symbiosis of evolutionary techniques and [25] W. Cohen, Learning to classify English text with
Jumping the curve, however, is not statistical natural language processing, IEEE Trans. Evol. ILP methods, in Advances in Inductive Logic Programming,
an easy task: the origins of human lan- Comput., vol. 8, no. 1, pp. 1427, 2004. L. De Raedt, Ed. Amsterdam, The Netherlands: IOS
[3] N. Asher and A. Lascarides, Logics of Conversation. Press, 1995, pp. 124143.
guage has sometimes been called the Cambridge, U.K.: Cambridge Univ. Press, 2003. [26] W. Cohen and Y. Singer, Context-sensitive learn-
hardest problem of science (Christiansen [4] J. Atkinson-Abutridy, C. Mellish, and S. Aitken, A ing methods for text categorization, ACM Trans. Inform.
semantically guided and domain independent evolution- Syst., vol. 17, no. 2, pp. 141173, 1999.
& Kirby, 2003). NLP technologies ary model for knowledge discovery from texts, IEEE [27] R. Collobert, J. Weston, L. Bottou, M. Karlen, K.
evolved from the era of punch cards and Trans. Evol. Comput., vol. 7, no. 6, pp. 546560, 2003. Kavukcuoglu, and P. Kuksa, Natural language process-
batch processing (in which the analysis [5] J. Barwise, An introduction to first-order logic, in ing (almost) from scratch, J. Mach. Learn. Res., vol. 12,
Handbook of Mathematical Logic. (Studies in Logic and the pp. 24932537, 2011.
of a natural language sentence could Foundations of Mathematics). Amsterdam, The Nether- [28] F. Costa, P. Frasconi, V. Lombardo, P. Sturt, and G.
take up to 7 minutes (Plath, 1967)) to lands: North-Holland, 1977. Soda, Ambiguity resolution analysis in incremental pars-
[6] A. Berger, V. D. Pietra, and S. D. Pietra, A maximum ing of natural language, IEEE Trans. Neural Netw., vol.
the era of Google and the likes of it (in entropy approach to natural language processing, Com- 16, no. 4, pp. 959971, 2005.
which millions of webpages can be pro- put. Linguistics, vol. 22, no. 1, pp. 3971, 1996. [29] D. Davidson, Seeing through language, in Royal
cessed in less than a second). Even the [7] F. Bex, H. Prakken, and B. Verheij, Formalizing Institute of Philosophy, Supplement. Cambridge, U.K.:
argumentative story-based analysis of evidence, in Proc. Cambridge Univ. Press, 1997, vol. 42 , pp. 1528.
most efficient word-based algorithms, Int. Conf. Artificial Intelligence Law, 2007, pp. 1-10. [30] L. Denoyer, H. Zaragoza, and P. Gallinari, HMM-
however, perform very poorly, if not [8] P. Bloom, Glue for the mental world, Nature, vol. based passage models for document classification and
421, pp. 212213, Jan. 2003. ranking, in Proc. 23rd European Colloq. Information Re-
properly trained or when contexts and [9] K. Bollacker, C. Evans, P. Paritosh, T. Sturge, and J. trieval Research, Darmstadt, Germany, 2001.
domains change. Such algorithms are Taylor, Freebase: A collaboratively created graph database [31] F. de Saussure, Cours de Linguistique Gnrale. Paris:
for structuring human knowledge, in Proc. ACM SIG- Payot, 1916.
limited by the fact that they can process
MOD Int. Conf. Management Data, 2008, pp. 12471250. [32] Y. Ding, Y. Jin, L. Ren, and K. Hao, An intelli-
only information that they can see. [10] J. Breslin, A. Harth, U. Bojars, and S. Decker, To- gent self-organization scheme for the Internet of things,
Language, however, is a system where all wards semantically-interlinked online communities, in IEEE Comput. Intell. Mag., vol. 8, no. 3, pp. 4153, 2013.
The Semantic Web: Research and Applications. Berlin Hei- [33] H. Drucker, D. Wu, and V. Vapnik, Support vector
terms are interdependent and where the delberg: Springer-Verlag, 2005, pp. 500514. machines for spam categorization, IEEE Trans. Neural
value of one is the result of the simulta- [11] D. Brickley and L. Miller. (2010). FOAF vocabu- Netw., vol. 10, no. 5, pp. 10481054, 1999.
lary specification 0.98. Namespace Document [Online]. [34] M. Dyer, Connectionist natural language pro-
neous presence of the others (De Sau- Available: http://xmlns.com/foaf/spec/ cessing: A status report, in Computational Architectures
ssure, 1916). As human text processors, [12] N. Bush, The predictive value of transitional prob- Integrating Neural and Symbolic Processes, R. Sun and L.
we see more than what we see (David- ability for word-boundary palatalization in English, Bookman, Eds. Dordrecht, The Netherlands: Kluwer
Unpublished M.S .thesis, Univ. New Mexico, Albuquer- Academic, 1995, vol. 292, pp. 389429.
son, 1997) in which every word acti- que, NM, 1999. [35] N. Eagle, P. Singh, and A. Pentland, Common sense
vates a cascade of semantically-related [13] J. Bybee and J. Scheibman, The effect of usage on conversations: Understanding casual conversation using a
degrees of constituency: The reduction of dont in Eng- common sense database, in Proc. Int. Joint Conf. Artificial
concepts that enable the completion of lish, Linguistics, vol. 37, no. 4, pp. 575596, 1999. Intelligence, 2003.
complex NLP tasks, such as word-sense [14] E. Cambria, P. Gastaldo, F. Bisio, and R. Zunino, [36] C. Fellbaum, WordNet: An Electronic Lexical Database
disambiguation, textual entailment, and An ELM-based model for affective analogical reason- (language, speech, and communication). Cambridge,
ing, Neurocomputing, Special Issue on Extreme Learning MA: The MIT Press, 1998.
semantic role labeling, in a quick and Machines, 2014. [37] M. Finlayson and P. Winston, Narrative is a key
effortless way. [15] E. Cambria and A. Hussain, Sentic Computing: Tech- cognitive competency, in Proc. 2nd Annu. Meeting Bio-
niques, Tools, and Applications. Dordrecht, The Nether- logically Inspired Cognitive Architectures, 2011, p. 110.
Concepts are the glue that holds our lands: Springer-Verlag, 2012. [38] P. Frasconi, G. Soda, and A. Vullo, Text categori-
mental world together (Murphy, 2004). [16] E. Cambria, D. Rajagopal, D. Olsher, and D. Das, Big zation for multi-page documents: A hybrid naive Bayes
Without concepts, there would be no social data analysis, in Big Data Computing, R. Akerkar, HMM approach, J. Intell. Inform. Syst., vol. 18, nos. 23,
Ed. London: Chapman and Hall, 2013, pp. 401414. pp. 195217, 2001.
mental world in the first place (Bloom, [17] A. Carlson, J. Betteridge, B. Kisiel, B. Settles, E. [39] A. Gangemi, V. Presutti, D. Reforgiato, Frame-
2003). Needless to say, the ability to Hruschka, and T. Mitchell, Toward an architecture for based detection of opinion holders and topics: A model
never-ending language learning, in Proc. Conf. Artificial and a tool, IEEE Comput. Intell. Mag., vol. 9, no. 1, pp.
organize knowledge into concepts is Intelligence AAAI, Atlanta, GA, 2010, pp. 13061313. 2030, 2014.
one of the defining characteristics of the [18] J. Carvalho, F. Batista, and L. Coheur, A critical [40] A. Ghandar, Z. Michalewicz, M. Schmidt, T. To,
human mind. A truly intelligent system survey on the use of fuzzy sets in speech and natural lan- and R. Zurbruegg, Computational intelligence for
guage processing, in Proc. IEEE Int. Conf. Fuzzy Systems, evolving trading rules, IEEE Trans. Evol. Comput., vol.
needs physical knowledge of how 2012, pp. 270277. 13, no. 1, pp. 7186, 2009.
objects behave, social knowledge of how [19] S. Ceccato, Correlational analysis and mechani- [41] J. Godfrey, E. Holliman, and J. McDaniel, Switch-
cal translation, in Machine Translation, A. D. Booth, board: Telephone speech corpus for research and develop-
people interact, sensory knowledge of Ed. Amsterdam, The Netherlands: North Holland, ment, in Proc. IEEE Int. Conf. Acoustics, Speech, Signal
how things look and taste, psychological 1967, pp. 77135. Processing, 1992, pp. 517520.

56 IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE | MAY 2014


[42] A. Goldberg, Constructions: A new theoretical ap- social media, IEEE Comput. Intell. Mag., vol. 9, no. 1, [89] L. Page, S. Brin, R. Motwani, and T. Winograd,
proach to language, Trends Cogn. Sci., vol. 7, no. 5, pp. pp. 3143, 2014. The pagerank citation ranking: bringing order to the
219224, 2003. [65] S. Lawrence, C. Giles, and S. Fong, Natural lan- web, Stanford Univ., Stanford, CA, Tech. Rep., 1999.
[43] D. Goldberg, Genetic Algorithms in Search, Optimization, guage grammatical inference with recurrent neural net- [90] J. Pearl, Bayesian networks: A model of self-acti-
and Machine Learning. Reading, MA: Addison-Wesley, 1989. works, IEEE Trans. Knowledge. Data Eng., vol. 12, no. 1, vated memory for evidential reasoning, UCLA comput.
[44] C. Gueret, S. Schlobach, K. Dentler, M. Schut, and pp. 126140, 2000. Sci., Irvine, CA: Tech. Rep. CSD-850017, 1985.
G. Eiben, Evolutionary and swarm computing for the [66] D. Lenat and R. Guha, Building Large Knowledge- [91] W. Plath, Multiple path analysis and automatic
semantic Web, IEEE Comput. Intell. Mag., vol. 7, no. 2, Based Systems: Representation and Inference in the Cyc Project. translation, in Machine Translation, A. D. Booth, Ed.
pp. 1631, 2012. Boston, MA: Addison-Wesley, 1989. Amsterdam, The Netherlands: North-Holland, 1967, pp.
[45] E. Gnes and D. Radev, LexRank: Graph-based [67] C. Liu, G. Qi, H. Wang, and Y. Yu, Reasoning with 267315.
lexical centrality as salience in text summarization, J. large scale ontologies in fuzzy pD* using mapreduce, [92] S. Ponzetto and M. Strube, Deriving a large-scale
Artif. Intell. Res., vol. 22, no. 1, pp. 457479, 2004. IEEE Comput. Intell. Mag., vol. 7, no. 27, pp. 5466, 2012. taxonomy from Wikipedia, in Proc. AAAI07 22nd Nat.
[46] K. Hayden, D. Novy, C. Havasi, M. Bove, S. Alfaro, [68] H. Liu, H. Lieberman, and T. Selker, A model of Conf. Artificial Intelligence, Vancouver, BC, 2007, pp.
and R. Speer, Narratarium: An immersive storytelling textual affect sensing using real-world knowledge, in 14401445.
environment, in Proc. Human-Computer Interaction, 2013, Proc. 8th Int. Conf. Intelligent User Interfaces, 2003, pp. [93] S. Poria, A. Gelbukh, A. Hussain, D. Das, and S.
vol. 374, pp. 536540. 125132. Bandyopadhyay, Enhanced SenticNet with affective
[47] M. Hearst, Automatic acquisition of hyponyms [69] M. Luong, R. Socher, and C. Manning, Better word labels for concept-based opinion mining, IEEE Intell.
from large text corpora, in Proc. 14th Conf. Computational representations with recursive neural networks for mor- Syst., vol. 28, no. 2, pp. 3138, 2013.
Linguistics, 1992, pp. 539545. phology, in Proc. Conf. Natural Language Learning, 2013. [94] I. Porteous, I. Newman, A. Ihler, A. Asuncion, P.
[48] J. Hef lin and J. Hendler, Shoe: A knowledge rep- [70] R. Magritte, Les mots et les images, La Rvolution Smyth, and M. Welling, Fast collapsed Gibbs sampling
resentation language for internet applications, Univ. surraliste, no. 12, 1929. for latent dirichlet allocation, in Proc. 14th ACM SIG-
Maryland, College Park, Maryland, Tech. Rep., 1999. [71] K. Mahesh, S. Nirenburg, and A. Tucker, Knowledge- KDD Int. Conf. Knowledge Discovery Data Mining, 2008,
[49] T. Hofmann, Probabilistic latent semantic index- Based Systems for Natural Language Processing. Boca Raton, pp. 569577.
ing, in Proc. 22nd Annu. Int. ACM SIGIR Conf. Research FL: CRC Press, 1997. [95] R. Quillian, A notation for representing conceptual
Development Information Retrieval, 1999, p. 5057. [72] C. Manning, and H. Schtze, Foundations of Statistical information: An application to semantics and mechanical
[50] T. Hofmann, Unsupervised learning by probabilis- Natural Language Processing. Cambridge, MA: MIT press, english paraphrasing, System Development Corp., Santa
tic latent semantic analysis, Machine Learn., vol. 42, nos. 1999. Monica, California, Tech. Rep. SP-1395, 1963.
12, pp. 177196, 2001. [73] M. Marcus, B. Santorini, and M. Marcinkiewicz, [96] R. Reiter, A logic for default reasoning, Artificial
[51] N. Howard and E. Cambria, Intention awareness: Building a large annotated corpus of english: The penn Intell., vol. 13, pp. 81132, 1980.
Improving upon situation awareness in human-centric en- treebank, Comput. Linguistics, vol. 19, no. 2, pp. 313 [97] W. Richards, M. Finlayson, and P. Winston, Ad-
vironments, Human-Centric Computing Information Sciences. 330, 1994. vancing computational models of narrative, MIT Com-
vol. 3, Cambridge, MA: Springer-Verlag, 2013. no. 9. [74] H. Martinez, Y. Bengio, and G. Yannakakis, Learn- puter Science and Artificial Intelligence Laboratory,
[52] N. Imparato and O. Harari, Jumping the Curve: In- ing deep physiological models of affect, IEEE Comput. Cambridge, MA, Tech. Rep. 2009-063, 2009.
novation and Strategic Choice in An Age of Transition. San Intell. Mag., vol. 8, no. 2, pp. 2033, 2013. [98] R. Schank, Conceptual Information Processing. Amster-
Francisco, CA: Jossey-Bass, 1996. [75] D. McGuinness and F. Van Harmelen, OWL web dam, The Netherlands: Elsevier Science Inc., 1975.
[53] P. Jackson and I. Moulinier, Natural Language Pro- ontology language overview, W3C recommendation, 2004. [99] B. Schlkopf, S. Mika, C. Burges, P. Knirsch, K.-R.
cessing for Online Applications: Text Retrieval, Extraction and [76] R. Mihalcea and P. Tarau, TextRank: Bringing or- Mller, G. Rtsch, and A. Smola, Input space versus fea-
Categorization. Philadelphia, PA: John Benjamins. 1997. der into texts, in Proc. Conf. Empirical Methods Natural ture space in kernel-based methods, IEEE Trans. Neural
[54] T. Joachims, Learning To Classify Text Using Support Language Processing, Barcelona, 2004. Netw., vol. 10, no. 5, pp. 10001017, 1999.
Vector Machines: Methods, Theory and Algorithms. Norwell, [77] A. Miles and S. Bechhofer, SKOS simple knowl- [100] F. Sebastiani, Machine learning in automated text
MA: Kluwer Academic, 2002. edge organization system reference, W3C Recommenda- categorization, ACM Comput. Surv., vol. 34, no. 1, pp.
[55] K. Jones and J. Galliers, Evaluating natural language tion, Tech. Rep. 2009. 147, 2002.
processing systems: An analysis and review, Comput. Lin- [78] M. Minsky, Semantic Information Processing. Cam- [101] R. Simmons, Synthetic language behavior, Data
guistics, vol. 24, no. 2, 1995. bridge, MA: MIT Press, 1968. Processing Manage., vol. 5, no. 12, pp. 1118, 1963.
[56] D. Jurafsky, A. Bell, M. Gregory, W. Raymond, [79] M. Minsky, The Society of Mind. New York: Simon [102] P. Singh. (2002). The open mind common sense
J. Bybee, and P. Hopper, Probabilistic Relations Between and Schuster, 1986. project. [Online]. Available: http://www.kurzweilai.net/
Words: Evidence From Reduction In Lexical Production. Am- [80] E. Mueller, Natural Language Processing with Thought- [103] J. Sowa, Semantic networks, in Encyclopedia of Ar-
sterdam, The Netherlands: John Benjamins, 2000. Treasure. New York: Signifonn, 1998. tificial Intelligence, S. Shapiro, Ed. New York: Wiley, 1987.
[57] J. Kacprzyk and S. Zadrozny, Computing with [81] E. Mueller, Modeling space and time in narratives [104] R. Stevenson, J. Mikels, and T. James, Character-
words is an implementable paradigm: Fuzzy queries, about restaurants, Literary Linguistic Comput., vol. 22, no. ization of the affective norms for english words by dis-
linguistic data summaries, and natural-language gen- 1, pp. 6784, 2007. crete emotional categories, Behav. Res. Methods, vol. 39,
eration, IEEE Trans. Fuzzy Syst., vol. 18, no. 3, pp. [82] I. Mukherjee, and D. Blei, Relative performance no. 4, pp. 10201024, 2007.
461472. 2010. guarantees for approximate inference in latent dirichlet [105] P. Subasic and A. Huettner, Affect analysis of text
[58] J. Kahan, Annotea: An open RDF infrastructure allocation, in Proc. Neural Information Processing Systems, using fuzzy semantic typing, IEEE Trans. Fuzzy Syst.,
for shared web annotations, Comput. Netw., vol. 39, no. Vancouver, BC, 2009, pp. 11291136. vol. 9, no. 4, pp. 483496, 2001.
5, pp. 589608, 2002. [83] G. Murphy, The Big Book of Concepts. Cambridge, [106] F. Suchanek, G. Kasneci, and G. Weikum, Yago:
[59] A. Kazemzadeh, S. Lee, and S. Narayanan, Fuzzy MA: MIT Press, 2004. A core of semantic knowledge, in Proc. 16th Int. World
logic models for the meaning of emotion words, IEEE [84] K. Nigam, A. McCallum, S. Thrun, and T. Mitchell, Wide Web Conf., 2007. pp. 697706.
Comput. Intell. Mag., vol. 8, no. 2, pp. 3449, 2013. Text classification from labeled and unlabeled docu- [107] P. Winston, The strong story hypothesis and
[60] M. Krug, String frequency: A cognitive motivating ments using EM, Machine Learn., vol. 39, nos. 23, pp. the directed perception hypothesis, in Proc. AAAi Fall
factor in coalescence, language processing, and linguistic 103134, 2000. Symp.: Advances Cognitive Systems, 2011.
change, J. Eng. Linguistics, vol. 26, no. 4, pp. 286320, [85] V. Novak, Fuzzy sets in natural language process- [108] W. Wu, H. Li, H. Wang, and K. Zhu, Probase: A
1998. ing, in An Introduction to Fuzzy Logic Applications in Intelli- probabilistic taxonomy for text understanding, in Proc.
[61] H. Kucera and N. Francis, Computational analysis gent Systems, Yager Ed. Norwell, MA: Kluwer Academic, ACM SIGMOD Int. Conf. Management Data, Scottsdale,
of present-day American English, Int. J. Amer. Linguis- 1992, pp. 185200. AZ, 2012, pp. 481492.
tics, vol. 35, no. 1, pp. 7175, 1969. [86] D. Olsher, COGVIEW & INTELNET: Nuanced [109] R. Xia, C. Zong, X. Hu, and E. Cambria, Fea-
[62] J. Lafferty, A. McCallum, and F. Pereira, Condi- energy-based knowledge representation and integrated ture ensemble plus sample selection: A comprehensive
tional random fields: Probabilistic models for segment- cognitive-conceptual framework for realistic culture, approach to domain adaptation for sentiment classifica-
ing and labeling sequence data, in Proc. 18th Int. Conf. values, and concept-affected systems simulation, in Proc. tion, IEEE Intell. Syst., vol. 28, no. 3, pp. 1018, 2013.
Machine Learning, 2001, pp. 282289. 2013 IEEE Symp. Computational Intelligence Human-Like [110] R. Young, Story and discourse: A bipartite model
[63] L. Lai, C. Wu, P. Lin, and L. Huang, Developing a Intelligence, Singapore, 2013, pp. 8291. of narrative generation in virtual worlds, Interaction Stud-
fuzzy search engine based on fuzzy ontology and seman- [87] M. ONeill and C. Ryan, Grammatical evolution, ies, vol. 8, pp. 177208, 2007.
tic search, in Proc. IEEE Int. Conf. Fuzzy Systems, Taipei, IEEE Trans. Evol. Comput., vol. 5, no. 4, pp. 349358, 2001. [111] H. Zellig, Distributional structure, Word, vol. 10,
Taiwan, 2011, pp. 26842689. [88] A. Ortony, G. Clore, and A. Collins, The cognitive pp. 146162, 1954.
[64] R. Lau, Y. Xia, and Y. Ye, A probabilistic generative structure of emotions, Cambridge, U.K.: Cambridge
model for mining cybercriminal networks from online Univ. Press, 1988.

MAY 2014 | IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE 57