You are on page 1of 13

Overview

Artificial intelligence
Gheorghe Tecuci

Artificial intelligence (AI) is the Science and Engineering domain concerned with
the theory and practice of developing systems that exhibit the characteristics
we associate with intelligence in human behavior. Starting with a brief history
of artificial intelligence, this article presents a general overview of this broad
interdisciplinary field, organized around the main modules of the notional
architecture of an intelligent agent (knowledge representation; problem solving
and planning; knowledge acquisition and learning; natural language, speech,
and vision; action processing and robotics) which highlights both the main areas
of artificial intelligence research, development and application, and also their
integration. 2011 Wiley Periodicals, Inc.

How to cite this article:


WIREs Comput Stat 2012, 4:168180. doi: 10.1002/wics.200

Keywords: artificial intelligence; intelligent agent; knowledge representation;


problem solving; learning; natural language processing; applications of artificial
intelligence

INTRODUCTION others. It has adopted many concepts and methods


from these domains, but it has also contributed back.
A rtificial intelligence (AI) is the Science and
Engineering domain concerned with the theory
and practice of developing systems that exhibit
While some of the developed systems, such as
an expert or a planning system, can be characterized
as pure applications of AI, most of the AI systems
the characteristics we associate with intelligence in
are developed as components of complex applications
human behavior, such as perception, natural language
to which they add intelligence in various ways, for
processing, problem solving and planning, learning
instance, by enabling them to reason with knowledge,
and adaptation, and acting on the environment.
to process natural language, or to learn and adapt.
Its main scientific goal is understanding the
It has become common to describe an AI system
principles that enable intelligent behavior in humans,
using the agent metaphor (Ref 1, p. 3463). Figure 1
animals, and artificial agents. This scientific goal
shows a notional architecture of an intelligent agent
directly supports several engineering goals, such as,
which identifies its main components. In essence, an
developing intelligent agents, formalizing knowledge
agent is a knowledge-based system that perceives its
and mechanizing reasoning in all areas of human
environment (which may be the physical world, a user
endeavor, making working with computers as easy as
via a graphical user interface, a collection of other
working with people, and developing humanmachine
agents, the Internet, or other complex environment);
systems that exploit the complementariness of human
reasons to interpret perceptions, draw inferences,
and automated reasoning.
solve problems, and determine actions; and acts
Artificial intelligence is a very broad interdisci-
upon that environment to realize a set of goals or
plinary field which has roots in and intersects with
tasks for which it has been designed. Additionally,
many domains, not only all the computing disciplines,
the agent will continuously improve its knowledge
but also mathematics, linguistics, psychology, neuro-
and performance through learning from input data,
science, mechanical engineering, statistics, economics,
from a user, from other agents, and/or from its own
control theory and cybernetics, philosophy, and many
problem solving experience. While interacting with
Correspondence to: tecuci@gmu.edu a human or some other agents, it may not blindly
Learning Agents Center and Computer Science Department, George obey commands, but may have the ability to modify
Mason University, Fairfax, VA, USA requests, ask clarification questions, or even refuse

168 2011 Wiley Periodicals, Inc. Volume 4, March/April 2012


WIREs Computational Statistics Artificial intelligence

Agent of its main areas of research which correspond to the


agent modules from Figure 1.
Problem Perceptual Sensory
solving engine processing input

Environment
BRIEF HISTORY OF ARTIFICIAL
Knowledge base Reasoning INTELLIGENCE
Artificial intelligence is as old as computer science
Learning Action Action since from the very beginning computer science
engine processing output researchers were interested in developing intelligent
computer systems.3 The name artificial intelligence
was proposed by John McCarthy when he and other
FIGURE 1 | The main modules of a knowledge-based agent. AI influential figures (Marvin Minsky, Allen Newell,
Herbert Simon, and others) organized a summer
to satisfy certain requests. It can accept high-level workshop at Dartmouth in 1956.
requests indicating what the user wants and can Early work in artificial intelligence focused
decide how to satisfy each request with some degree of on simple toy domains and produced some very
independence or autonomy, exhibiting goal-directed impressive results. Newell and Simon developed a
behavior and dynamically choosing which actions to theorem proving system that was able to demon-
take, and in what sequence. It can collaborate with strate most of the theorems in Chapter 2 of Rus-
users to improve the accomplishment of their tasks sell and Whiteheads Principia Mathematica (Ref 1,
or can carry out such tasks on their behalf, based on p. 1718). Arthur Samuel developed a checker play-
knowledge of their goals or desires. It can monitor ing program that was trained by playing against
events or procedures for the users, can advise them on itself, by playing against people, and by following
performing various tasks, can train or teach them, or book games. After training, the memory contained
can help them collaborate (Ref 2, p. 112). roughly 53,000 positions, and the program became
Most of the current AI agents, however, do not rather better-than-average novice, but definitely not
have all the components from Figure 1, or some of an expert (Ref 4, p. 217), demonstrating that signif-
the components have very limited functionality. For icant and measurable learning can result from rote
example, a user may speak with an automated agent learning alone. Minskys students developed systems
(representing her Internet service provider) that will that demonstrated several types of intelligent behavior
guide her in troubleshooting her Internet connection. for problem solving, vision, natural language under-
The agent may have advanced speech, natural lan- standing, learning and planning, in simplified domains
guage, and reasoning capabilities, but no visual or known as microworlds, such as the one consisting of
learning capabilities. A natural language interface to a solid blocks on a tabletop. Robinson5 developed the
data base may only have natural language processing resolution method which, theoretically, can prove any
capabilities, while a face recognition system may only theorem in first-order logic.
have learning and visual perception capabilities. These successes have generated much enthusiasm
Artificial intelligence researchers investigate and the expectation that AI will soon create machines
powerful techniques in their quest for realizing intelli- that think, learn, and create at levels surpassing even
gent behavior. But these techniques are pervasive and human intelligence. However, attempts to apply the
are no longer considered AI when they reach main- developed methods to complex real-world problems
stream use. Examples include time-sharing, symbolic have consistently ended in spectacular failures. A
programming languages (e.g., Lisp, Prolog, Scheme), famous example is the automatic translation of the
symbolic mathematics systems (e.g., Mathematica), phrase the spirit is willing but the flesh is weak into
graphical user interfaces, computer games, object- Russian, and then back to English, as the vodka is
oriented programming, the personal computer, email, good but the meat is rotten (Ref 1, p. 21). This has
hypertext, and even the software agents. While this led to an AI winter when previously generous funding
tends to diminish the merits of AI, the field is contin- for AI research was significantly reduced.
uously producing new results and, due to its current Why have early AI systems failed to scale-up
level of maturity and the increased availability of to solve complex real-world problems? One reason is
cheap computational power, it is a key technology in that most of them knew almost nothing about their
many of todays novel applications. subject matter. They solved problems by trying all
The next section provides a brief history of the the possible combinations of steps until a solution
evolution of AI. This is followed by short presentations was found, and were successful because the search

Volume 4, March/April 2012 2011 Wiley Periodicals, Inc. 169


Overview wires.wiley.com/compstats

space was very small. It was realized that, in order multiple cognitive functions. This, in turn, has led to
to solve complex real-world problems, a system an understanding that various approaches and meth-
would need huge amounts of knowledge, as well as ods developed in the isolated subfields of AI (natural
heuristics to limit the search for solutions in large language processing, knowledge representation, prob-
problem spaces. It was also realized that building an lem solving and planning, machine learning, robotics,
intelligent agent is very difficult because the cognitive computer vision, etc.) need to be interoperable to both
functions to be automated are not understood well- facilitate and take advantage of their integration. This
enough. This has led AI researchers to focus on has also led to an understanding that the symbolic and
individual cognitive processes, such as learning, and subsymbolic approaches to AI are not competing but
on studying elementary problems in depths, such complementary, and both may be needed in an agent.
as concept learning. The consequence was the split The result was the development of agent architectures,
of artificial intelligence into many different areas, such as ACT-R,8 SOAR,9 and Disciple,10 the devel-
including knowledge representation, search, game opment of agents for different types of applications
playing, theorem proving, planning, probabilistic (including agents for WWW, search and recommender
reasoning, learning, natural language processing, agents), robots, and multi-agent systems (for instance,
vision, robotics, neural networks, genetic algorithms, an intelligent house).
and others. Each of these areas has established its Another aspect of reintegration and interoper-
own research community, with its own conferences ability is that algorithms developed in one area are
and journals, and limited communication with the used to improve another area. An example is the use
research communities in other areas. Another split of probabilistic reasoning and machine learning in
has occurred with respect to the general approach to statistical natural language processing.11
be used in developing an intelligent system. One is the The next sections will briefly review some of
symbolic approach which relies on the representation the main areas of AI, as identified by the modules
of knowledge in symbolic structures and in logic- in Figure 1. The goal is to provide an intuitive
based reasoning with these structures. The other is understanding of each area, its methods, and its
the subsymbolic approach that focuses on duplicating applications.
the signal-processing and control abilities of simpler
animals, using brain-inspired neural networks,
biologically inspired genetic algorithms, or fuzzy logic. KNOWLEDGE REPRESENTATION
Research in each of these narrower domains An intelligent agent has an internal representation of
facilitated significant progress and produced many its external environment which allows it to reason
successful applications. One of the first successes, about the environment by manipulating the elements
which also marks the beginning of the AI industry, of the representation. For each relevant aspect of the
was the development and proliferation of expert sys- environment, such as an object, a relation between
tems. An expert system incorporates a large amount objects, a class of objects, a law, or an action, there
of domain and human problem solving expertise in is an expression in the agents knowledge base which
a specific area, such as medical diagnosis, engineer- represents that aspect. For example, Figure 2 shows
ing design, military planning, or intelligence analysis, one way to represent the situation shown in its upper-
allowing it to perform a task that would otherwise be right side. The upper part of Figure 2 is a hierarchical
performed by a human expert.6,7 representation of the objects and their relationships
The increasing availability of large data (an ontology). Under it is a rule to be used for rea-
sets, such as the World Wide Web or the genomic soning about these objects. This mapping between
sequences, as well as the increased computational real entities and their representations allows the agent
power available, has created opportunities for new AI to reason about the environment by manipulating
methods that rely more on data than on algorithms. its internal representations and creating new ones.
For example, the traditional approach to answering a For example, by employing natural deduction and its
natural language query from a data repository empha- modus ponens rule, the agent may infer that cup1 is on
sized deep understanding of the query, which is a very table1. The actual algorithm that implements natural
complex task. But when the repository is as large as deduction is part of the problem solving engine, while
the World Wide Web one may simply provide a tem- the actual reasoning is performed in the reasoning
plate for the answer, being very likely that it will be area (Figure 1).
matched by some information on the web. This simple example illustrates an important
Progress in various areas of AI has led to a architectural characteristic of an intelligent agent,
renewed interest in developing agents that integrate the separation between knowledge and control,

170 2011 Wiley Periodicals, Inc. Volume 4, March/April 2012


WIREs Computational Statistics Artificial intelligence

on propositional logic. However, new knowledge can


Ontology object be easily integrated into the existing knowledge due
to the modularity of the representation. Thus, the
subclass of learning efficiency of predicate calculus is moderate.
Production rules,8,9,16,18 which represent knowl-
vessel publication furniture edge in the form of situation-action pairs, possess
subclass of
similar features. They are particularly well-suited for
subclass of subclass of
representing knowledge about what to do in certain
cup book table situations (e.g., if the car does not start then check
the gas), and are used in many agents. However, they
instance of instance of instance of are less adequate for representing knowledge about
on on objects.
cup1 book1 table1
Semantic networks, frames, and ontologies14,1924
are, to a large extent, complementary to production
Rule
systems. They are particularly well suited for rep-
x,y,z object, (on x y) & (on y z) (on x z) resenting objects and states, but have difficulty in
representing processes. As opposed to production sys-
FIGURE 2 | A situation and its representation.
tems, their inferential efficiency is very high because
the structure used for representing knowledge is also
represented in Figure 1 by separate modules for the a guide for the retrieval of knowledge. However, their
knowledge base and the problem solving engine. learning efficiency is low because the knowledge that
While the knowledge base contains the data structures is added or deleted affects the rest of the knowl-
that represent the entities from the environment (as edge. Therefore, new knowledge has to be carefully
shown in Figure 2), the inference engine implements integrated into the existing knowledge.
general methods of solving input problems based on In response to these complementary character-
the knowledge from the knowledge base, as will be istics, many agents use hybrid representations, such
discussed in the next section. as a combination of ontology and rules, as shown in
When designing the knowledge representation Figure 2.
for an intelligent agent, one has to consider four Probabilistic representations have been intro-
important characteristics.12 The first is the represen- duced to cope with the uncertainty that derives from
tational adequacy which characterizes the ability to a simplified representation of the world, and to enable
represent the knowledge needed in a certain applica- reasoning with evidence through which many agents
tion domain. The second is the inferential adequacy experience the world. For example, Figure 3 shows a
which denotes the ability to represent the inferential Bayesian network due to Judea Pearl. It represents the
procedures needed to manipulate the representational prior probabilities of specific events (e.g., the prior
structures to infer new knowledge. The third is the probability of a burglary at Bobs house is 0.002, and
problem solving efficiency characterizing the ability the prior probability of an earthquake is 0.005), and
to represent efficient problem solving procedures. the causal relationships between events (both a bur-
Finally, is the learning efficiency characterizing the glary and an earthquake cause the house alarm set off
ability to acquire and learn new knowledge and to with certain probabilities, house alarm set off causes
integrate it within the agents knowledge structures, John and Mary to call Bob with certain probabili-
as well as to modify the existing knowledge structures ties). Using the representation in Figure 3, one may
to better represent the application domain. infer, for example, the probability that a burglary
Since no representation has yet been found that is has occurred, assuming that both John and Mary
optimal with respect to all of the above characteristics, called Bob. As in the case of logical representation
several knowledge representation systems have been systems, several probabilistic representation systems
developed.13,14 Most of them are based on logic. For have been developed, such as Bayesian,25 Baconian,26
example, predicate calculus1517 has a high represen- belief functions,27 and fuzzy,28 because none of them
tational and inferential adequacy, but a low problem can cope with all the characteristic of evidence which
solving efficiency. The complexity of first-order pred- is always incomplete, usually inconclusive, frequently
icate calculus representation makes it very difficult to ambiguous, commonly dissonant, and has various
implement learning methods and they are not efficient. degrees of believability.29
Therefore, most of the existing learning methods are Recent research focuses on the more formal
based on restricted forms of first-order logic or even representation of the information on the web to

Volume 4, March/April 2012 2011 Wiley Periodicals, Inc. 171


Overview wires.wiley.com/compstats

Burglary at Bobs Earthquake I


house P(B)=0.002 P(E)=0.005
O3 O6
O4
Causes
S3 S4 S6

Alarm set off at Bobs house O5 O7


P(AB,E)=0.96
P(AB,E)=0.93 S45 S47
P(AB,E)=0.21
P(AB,E)=0.01 O1

Causes O3

John calls Bob Mary calls Bob


P(JA)=0.93 P(MA)=0.65 O2
P(JA)=0.06 P(MA)=0.01
G
FIGURE 3 | Bayesian network.
FIGURE 4 | Problem solving as search.
facilitate its processing by automated agents, such
as the development of the Ontology Web Language.30 Figure 4 is part of the inference engine. The actual
tree is built in the Reasoning area (Figure 1).
Many algorithms have been developed to solve
PROBLEM SOLVING AND PLANNING a problem represented as state space search, including
breath-first search, depth first search, uniform cost
Artificial intelligence has developed general methods search, iterative deepening depth first search, greedy
for theorem proving, problem solving and planning, best-first search, A , hill-climbing, simulated anneal-
such as, resolution, state space search, adversarial ing, and genetic algorithms (1, p. 59193). A major
search, problem reduction, constraint satisfaction, and difficulty is the size of the search space which makes
case-based reasoning. One important characteristic of the use of an exhaustive search unfeasible for real-
these methods is the use of heuristic information that world problems. The algorithms therefore need to
guides the search for solutions in large problem spaces. use domain-specific heuristic information that guides
While heuristics never guarantee optimal solutions, them in considering only some of the successors of a
or even finding a solution, useful heuristics lead to node, in a certain order.
solutions that are good enough most of the time. This general approach to problem solving has
In state space search, a problem P is represented been applied to a wide range of real-world problems,
as an initial state I, a set O of operators (each including route finding in computer networks, auto-
transforming a state into a successor state), and a mated travel advisory systems, airline travel planning
set G of goal states. A solution of the problem P is a systems, planning movements for automatic circuit
finite sequence of applications of operators, such as board drilling, VLSI layout on the chip and channel
(O4 , O5 , O1 , O3 , O2 ), that change the initial state into routing between the cells, robot navigation, and robot
one of the goal states, as shown in Figure 4. Consider, assembly.
for example, a robot that can manipulate the objects In adversarial search, which is used by the game
from Figure 2. We may ask this robot to bring us the playing agents, I is the initial board position and the
book. The robot needs to find a sequence of actions players alternate in selecting the move to make. Before
that transforms the initial state I shown in Figure 2 deciding on the next move, the first player projects the
into a state G where we have the book in our hands, game as far as possible into the future. It considers all
such as: pick-up cup1, place cup1 on table1, pick-up the possible moves it can make in the initial position
book1, etc. The definitions of all the actions that the (i.e., O3 , O4 , and O6 in Figure 4). Each such move
robot can perform (e.g., pick-up, place, etc.), with (e.g., O4 ) would change the game board into a new
their applicability conditions and their effects on the position (S4 ) where it is the turn of the adversary
state of the world, are represented in the knowledge to move. Thus, the first player now considers all the
base of the robot. The actual algorithm that applies possible moves that the adversary can make (i.e., O5
these operators in order to build the search tree in and O7 in state S4 ), then all its possible responses, and

172 2011 Wiley Periodicals, Inc. Volume 4, March/April 2012


WIREs Computational Statistics Artificial intelligence

so on. This continues until states are reached which patient, planning a meal, or designing a circuit), then
represent end positions in the game (i.e., win, loose, or transferring and adjusting the solution to the new
draw). Then, starting from bottom-up, the first player problem.32
determines the value (win, draw, or loose) of each Another general problem solving method that
intermediate node, based on how the game will end has been employed in expert systems for a wide
from that node. After all this projection is made, the variety of tasks, including planning, design, critiquing,
first player is ready to select, as its first move, the one symbolic integration, and intelligence analysis, is
that leads to the board position having the best result. problem reduction.2,33 In this approach a problem
If both players choose their best moves, the game will is solved by successively reducing it top-down
end with that result. to simpler problems, finding the solutions of the
This approach, known as mini max, or its more simplest problems, and combining these solutions,
advanced alpha-beta version, enables the agent player from bottom-up, to obtain the solution of the initial
to select the best move. The problem, however, is problem. A simple illustration of this method is
that the search space is huge for any non-trivial game. shown in Figure 5 where the problem Assess whether
In the case of checkers, for instance, it has been the United States will be a global leader in wind
estimated that a complete game tree has around 1040 power within the next decade is first reduced to
nonterminal nodes. If one assumes that these nodes are three simpler problems (based on a question and its
generated at a rate of 3 billion/second, the generation answer), each assessing whether the United States has
of the whole tree would still require around 1021 the reasons, the desire and, respectively, the capability
centuries! (4, p. 211). The search space for chess is to be a global leader. Each of these problems is
much larger, but significantly smaller than the search further reduced to even simpler problems (guided
space for military operations, which involve more by other questions and answers). For example, the
players, more possible moves, uncertainty about the middle problem is reduced to three other problems.
state of the world (such as the actual dispositions of These top-down problem reductions continue until
the opponents units), and the use of deception by one reaches problems which have known solutions.
both forces. Then these solutions are successively combined, from
It is therefore clear that an automated agent bottom-up, to obtain the solutions of the upper-level
cannot generate the entire game tree to find the problems, and of the top-level problem. As shown in
optimal move. What it can do is to build as much Figure 5, these solutions are probabilistic (e.g., It is
of the tree as possible, and use heuristic functions to almost certain that the people the United States desire
estimate the values of the generated leaf nodes which the United States to be a global leader in wind power
do not represent end-game positions. Of course, the within the next decade.) and are combined using
closer these nodes are to end positions, the better the operators such as min, max, average, or weighted sum.
estimate produced by the heuristic function. An important characteristic of the problem
It is this computational complexity that explains reduction method is that it shows very clearly the
why only in 1997 was an automated agent (Deep reasoning logic, making it suitable for developing
Blue of IBM) able to defeat Gary Kasparov, the knowledge-based agents that assist experts and non-
reigning world champion. The program ran on a experts in problem-solving, and teach expert problem
very powerful parallel computer generating up to solving to students.
30 billion positions per move to explore about 14
moves in advance. It contained a database of about
4000 open positions, 700,000 grandmaster games, a KNOWLEDGE ACQUISITION
large number of end-game solutions, coupled with a
heuristic evaluation function based on about 8000
AND LEARNING
features.31 Much of the power of an intelligent agent derives from
In general, game playing agents are better than the knowledge in its knowledge base (Figure 1). A
humans in games where they can search much of main goal of the knowledge acquisition and machine
the game space (such as Othello). But they are much learning research is precisely to enable an agent to
weaker in games where the search space is very large, acquire or learn this knowledge from a user, from
such as Go. input data, or from agents own problem solving
Case-based Reasoning is a form of problem experience. This results in improving the competence
solving by analogy in which a new problem is solved of the agent in solving a broader class of problems,
by recognizing its similarity to a previously solved and in making fewer mistakes in problem solving.
problem (which could be classifying the disease of a It may also result in improving the efficiency of the

Volume 4, March/April 2012 2011 Wiley Periodicals, Inc. 173


Overview wires.wiley.com/compstats

Assess whether the United States will be a global leader in wind power within the next decade.

very likely (max)

Q:What factors should we consider?


A:Reasons, desire, and capability.

very likely (min)

Assess whether the United Assess whether the United Assess whether the United
States has reasons to be a States has the desire to be a States has the capability to be a
global leader in wind power global leader in wind power global leader in wind power
within the next decade. within the next decade. within the next decade.

almost certain (max) very likely (max) almost certain (max)

Q:Who are the main stakeholders who determine the desire of the United States?
A:The people, the major political parties, and the energy industries because the United States has a
democratic government.

very likely (average)

Assess whether the people of Assess whether the major Assess whether the energy
the United States desire the political parties in the United industries of the United States
United States to be a global States desire the United States desire the United States to be a
leader in wind power within the to be a global leader in wind global leader in wind power
next decade. power within the next decade. within the next decade.

almost certain (max) very likely (max) likely (max)

FIGURE 5 | Problem solving through reduction and synthesis.

agent in solving the problems faster and with less belongs or not to the concept. Learning in a neural
memory. network consists in continuously classifying known
Because of the high complexity of learning, much examples and updating the weights associated with
of the research has focused on the basic task of the connection between the units, to improve the
concept learning, such as learning the concept cup, recognition accuracy (Ref 37, p. 81127). Support
or the concept person who will default on bank vector classifiers map the positive and the negative
loan. In essence, concept learning consists in finding examples of the concept, nonlinearly, into a higher-
a classification function which distinguishes between dimensional feature space via a kernel function,
the entities that are instances of the concepts from and construct a separating hyperplane there with
those that are not. Many of the developed learning maximum margin which yields a nonlinear decision
strategies can be characterized as empirical inductive boundary in the input space.38 Bayesian classifiers
learning from examples, which consists of learning determine the most likely hypothesis or concept by
the definition of a concept by comparing positive and using the Bayes rule P(H|E ) = P(E |H)P(H)/P(E )
negative examples of the concept in terms of their that computes the posterior probability of the
similarities and differences, and inductively creating hypothesis H based on its prior probability and
a generalized description of the similarities of the the observed evidence. This type of learning proved
positive examples.34,35 Some methods are based on to be very effective in applications where prior
the information theory to learn the concept in the probabilities can be computed, and is extensively used
form of a decision tree36 that is used to classify for statistical natural language processing (Ref 37,
the objects. Other methods represent the learned p. 154200).
concept as a neural network, whose output unit There are many other learning strategies,
determines whether the entity at its input units besides inductive concept learning from examples.

174 2011 Wiley Periodicals, Inc. Volume 4, March/April 2012


WIREs Computational Statistics Artificial intelligence

For instance, explanation-based learning consists Both analogical learning and abductive learn-
of learning an operational definition of a concept ing extend the knowledge base with new pieces of
by proving that an example is an instance of knowledge and usually improve the competence of
the concept and by deductively generalizing the the agent. In the case of analogical learning, the input
proof.39,40 As a result, the agent identifies the may consist of a new entity I, the knowledge base
important features of the concept, allowing it to should contain an entity S which is similar to I, and
recognize much faster the positive examples of the the goal is to learn new knowledge about the input I
concept, by simply checking that they have these by transferring it from the known entity S. In abduc-
features. Analogical learning consists of learning tive learning, the input may be a fact F, the knowledge
new knowledge about an entity by transferring base should contain knowledge related to the input
it from a similar entity, and by testing it.41,42 and the goal is to learn a new piece of knowledge that
Abductive learning consists of hypothesizing causes would explain the input.
based on observed effects.43 Conceptual clustering Each learning method, used separately, has lim-
consists of classifying a set of objects into different ited applicability because it requires a special type
classes/concepts and in learning a description of of input and background knowledge, and it learns
each such class/concept.44 Quantitative discovery a specific type of knowledge. On the other hand,
consists in discovering a quantitative law relating the complementary nature of these requirements and
values of variables characterizing an object or results naturally suggests that by properly integrating
a system.45 Reinforcement learning consists of these single-strategy methods, one can obtain a syn-
improving agents knowledge based on feedback from ergistic effect in which different strategies mutually
the environment.46 Genetic algorithm-based learning support each other and compensate for each others
consists of evolving a population of individuals over a weaknesses. A large number of multistrategy learn-
sequence of generations, based on models of heredity ing agents that integrate various learning strategies
and evolution.47 have been developed.48,49 Some integrate empirical
These learning methods may be used to extend induction with explanation-based learning, while oth-
the ontology of an agent with new concepts or facts, or ers integrate symbolic and neural net learning, or
to learn and refine its reasoning rules. Many of these deduction with abduction and analogy, or quantita-
methods are complementary in terms of the input tive and qualitative discovery, or symbolic and genetic
from which the agent learns, the a priori knowledge algorithm-based learning, and so on.
the agent needs in order to learn, the type of infer- A type of multistrategy learning is that employed
ences made during learning, what is actually learned, by the Disciple agents that can be trained how to
and the effect of learning on agents performance. For perform their tasks, in ways that are similar to how
instance, in the case of empirical inductive learning one would train students or apprentices, through
from examples, in which the primary type of infer- specific examples and explanations, and through the
ence is induction, the input may consist of many supervision and correction of their behavior.2,50 For
(positive and/or negative) examples of some concept instance, an expert may show a Disciple agent the
C, the knowledge base usually contains only a small reasoning tree from Figure 5. Each question/answer
amount of knowledge related to the input, and the pair following a problem represents the explanation of
goal is to learn a description of the concept C in why that problem is decomposed in the indicated way.
the form of an inductive generalization of the positive By understanding and generalizing these reasoning
examples which does not cover the negative examples. steps and their explanations, the agent learns general
This description extends or refines the knowledge base reasoning rules. It then analogically applies these rules
and improves the competence of the agent in solving a to solve similar problems, such as, Assess whether
larger class of problems and in making fewer mistakes. China will be a global leader in solar power within the
In the case of explanation-based learning, in next decade. The expert analyses agents reasoning
which the primary type of inference is deduction, the and characterizes each step as correct or incorrect,
input may consist of only one example of a concept also helping the agent in understanding its mistakes.
C, the knowledge base should contain complete Each of these reasoning steps represents a positive or a
knowledge about the input, and the goal is to learn an negative example for a previously learned rule which
operational description of C in the form of a deductive is appropriately generalized to cover the positive
generalization of the input example. This description example, or specialized to no longer cover the negative
is a reorganization of some knowledge pieces from example. As the agent learns new rules and concepts
the knowledge base and improves the problem solving from the expert, their interaction evolves from a
efficiency of the agent. teacher-student interaction, toward an interaction

Volume 4, March/April 2012 2011 Wiley Periodicals, Inc. 175


Overview wires.wiley.com/compstats

where they both collaborate in problem-solving. This use of sentences that appear ill-formed because they
process is based on mixed-initiative problem solving,51 are incomplete, which requires the extraction of the
where the expert solves the more creative parts of the missing parts from previous sentences), and references
problem and the agent solves the routine ones and (where entities are referred by pronouns, such as
learns from the creative solutions, integrated learning it or they, without giving their names). But even
and teaching, where the expert helps the agent to learn considering larger paragraphs may not be enough for
(e.g., by providing representative examples, hints, and understanding their meaning, unless the agent has
explanations), and the agent helps the expert to teach a large amount of knowledge about the domain of
it (e.g., by presenting attempted solutions and by discourse.
asking relevant questions), and, as already mentioned, As illustrated in Figure 6, understanding a natu-
multistrategy learning, where the agent integrates ral language sentence (such as John read the book.)
complementary strategies, such as learning from involves a sequence of stages, including morpholog-
examples, learning from explanations, and learning ical analysis (which analyzes individual words and
by analogy, to learn general concepts and rules. assigns syntactic categories to them), syntactic analysis
(which produces a parse tree that identifies the syntac-
tic components of the sentence, such as noun phrase
NATURAL LANGUAGE, SPEECH, and verb phrase), and semantic analysis (which pro-
AND VISION duces an internal representation of the meaning of the
The perceptual processing module in Figure 1 summa- sentence). The follow-on phase of discourse integra-
rizes agents capabilities to process natural language, tion performs a global analysis of all the sentences
speech, and visual inputs. All are very easy for humans and completes their meaning. For example, the agent
and very difficult for automated agents. determines that it in the discourse John read the
When an agent receives input in natural book. It was very interesting. refers to the book.
language, it has to understand it, that is, to build Finally, pragmatic analysis reinterprets the structure
an internal representation of its meaning, which representing what was said in order to determine
can then be used by the problem solving engine. what to do. In the above example, the intended effect
This process, however, is very difficult for several is declarative and the agent just records what was
reasons. Natural language is ambiguous at all levels: said. In the case of a question to a data base, the
morphology, syntax, semantics, and discourse.52,53 pragmatic analysis results in the translation of the
Just by hearing a common word such as run we semantic representation of the question into a query
cannot say whether it is a noun or a verb. The in the language of the data base system.55
WordNet semantic dictionary54 gives 16 senses for These stages may be performed in sequence
the noun interpretation and 41 senses for the verb or in parallel. They are based on various types
interpretation. What does the word diamond mean? of grammars which specify the rules for well-
Does it mean the mineral consisting of nearly pure formed expressions, and are augmented with semantic
carbon in crystalline form? Does it mean a gem or interpretations. Early natural language understanding
other piece cut from this mineral? Does it mean a systems were based on manually defined grammars
lozenge-shaped plane figure ()? Does it mean the and were limited in their coverage of a given
playing field in Baseball? Because the meaning of natural language. Therefore successful systems were
a word cannot be determined in isolation, it needs developed for restricted areas of discourse, such as
to be interpreted in the context of its surrounding airline reservation or question answering in a specific
words. But sentences themselves may be ambiguous. domain.
What does Visiting relatives can be boring mean? The availability of large language corpora on the
Does it mean that the act of visiting relatives can World Wide Web has had a very significant impact
be boring? Maybe it means that the relatives who on the field of natural language processing, both in
visit us can be boring. Consider also the possible terms of the methods and techniques involved (which
meanings of the following sentence: She told the are now mainly based on probability and statistics),
man that she hated to run alone. Therefore the and in terms of its applications. In these approaches,
meanings of individual sentences need themselves prior probabilities are associated with specific words,
to be interpreted in the context of the paragraphs and with the rules of the grammar. This allows one
that contain them. This is also necessary because to determine the probability distribution of various
of additional complexities of natural language, such meanings of an ambiguous word or sentence, and
as, the use of paraphrases (where the same meaning to determine the most likely meaning in a given
may be expressed by different sentences), ellipses (the context.11,56

176 2011 Wiley Periodicals, Inc. Volume 4, March/April 2012


WIREs Computational Statistics Artificial intelligence

John read the book. As mentioned several times, an intelligent


agent needs a huge amount of knowledge in
Morphological analysis
its knowledge base, which is very difficult to
define. With the progress made in statistical natural
proper noun verb determiner noun language processing, some of this knowledge, such
as parts of the ontology (i.e., concepts, instances,
John read the book
relationshipssee Figure 2) can be automatically
learned.
Syntactic analysis Finally, another important area of natural
language processing is automatic translation from one
sentence
natural language into another. All translation systems
verb phrase use some models of the source and target languages.
Classical approaches attempt to understand the
noun phrase
noun phrase source language text, translate it into an interlingua
proper noun verb determiner noun
representation, and then generate sentences in the
target language from that representation. In statistical
John read the book approaches, translations are generated on the basis of
statistical models whose parameters are derived from
Semantic analysis the analysis of bilingual parallel text corpora.
Speech recognition consists in identifying the
person spoken words from their acoustic signals, and is
difficult because these signals are both ambiguous
reading instance of and noisy. Therefore, the most successful approaches
book
instance of John
are also based on statistical methods. Both machine
agent instance of translation and speech recognition are among the
reading1 object biggest successes of artificial intelligence, and are part
book1
of many applications.57
Research on vision concerns the development of
FIGURE 6 | Stages in understanding a natural language sentence. algorithms allowing an agent to extract information
from its environment to recognize and manipulate
A type of very successful application of statistical objects, and to navigate.58 Many algorithms have
natural language processing and concept learning been developed that detect the edges, texture,
is classifying a text into one of several categories. and surfaces of objects, and segment an image into its
Examples of such applications include identifying the main components. However, recognizing component
language of a text, identifying whether a product objects in a scene, or understanding the scene, remain
review is positive or negative, or whether an email very difficult problems.
message is spam or non-spam, with recognition
accuracy in the range of 9899% and, in some cases,
exceeding 99.9% (Ref 1, p. 866).
ACTION PROCESSING AND ROBOTICS
In information retrieval, a user specifies a query The action processing module in Figure 1 corresponds
in natural language and the agent has to return the to the agents actions upon that environment aimed at
most relevant documents from a given knowledge realizing the goals or tasks for which it was designed.
repository, such as the World Wide Web. In question Such an action could be the generation of an answer
answering, the user desires an answer to its question, to a question, the solution of an input problem, the
rather than a document. Because of the huge amount manipulation of an object, or the navigation to a new
of information on the web, which is likely to contain position. Since most of these actions have already been
many answers to a question, and in many forms, an addressed in the above sections, here we only address
agent has to understand what the question is about object manipulation and navigation, which are the
(topic), and what the user is interested in (focus). It main concern of the robotics area.59
can then provide a simple template for the expected One may distinguish between three main types
answer rather than trying all the possible paraphrases of robots: (a) manipulators which are robotic arms
based on deep natural language understanding. This attached to their workspace, such as those used in car
is because it is very likely that the simple answer assembly and painting; (b) mobile robots with wheels,
template will match some text on the web. legs, or wings, used to move objects around, such

Volume 4, March/April 2012 2011 Wiley Periodicals, Inc. 177


Overview wires.wiley.com/compstats

as the unmanned air vehicles for surveillance, crop- general, an agent solves complex real-world problems
spraying, or military operations, or the autonomous by using large amounts of knowledge and heuristic
underwater vehicles; and (c) mobile manipulators methods. It is highly desirable that the agents knowl-
that combine mobility with manipulation to accom- edge and reasoning are understandable to humans,
plish more complex tasks. A challenging problem in and the agent is able to explain its behavior, what
robotics is localization and mapping, which consists decisions it is making, and why. The agent may
in finding out where things are and building a map reason with data items that are more or less in con-
of the environment. Another challenging problem is tradiction with one another, and may provide some
path planning from one point in space to another solution without having all the relevant data. The
point, which may involve compliant motion, where agent should be able to communicate with its users,
the robot moves while maintaining physical contact ideally in natural language, and it may continuously
with an object (e.g., an obstacle, a box it pushes, or a learn.
screw it inserts). Why are intelligent agents important? Because
Robots have many applications in industry (e.g., humans have limitations that agents may alleviate,
for part assembly or painting), agriculture (e.g., as
such as limited attention span, ability to analyze only
special machines), transportation (e.g., autonomous
a small number of alternatives at a time, or mem-
vehicles), health care (e.g., as devices for surgery),
ory for details that is affected by stress, fatigue or
hazardous environments (e.g., for clearing minefields
time constraints. Humans are slow, sloppy, forget-
or cleaning up nuclear waste), space exploration, and
ful, implicit, and subjective. But they have common
entertainment. They can provide personal services
sense and intuition, and may find creative solutions
(e.g., vacuum cleaning), or can act as human aug-
mentation devices (e.g., by providing additional force in new situations. By contrast, agents are fast, rig-
to facilitate walking or arm movement). orous, precise, explicit, and objective. But they lack
common sense and the ability to deal with novel
situations.60,61 Humans and agents may thus engage
CONCLUSION in mixed-initiative reasoning that takes advantage of
The main goal of AI is to develop computational their complementary strengths and reasoning styles.
agents that exhibit the characteristics we associate As such, intelligent agents enable us to do our tasks
with intelligence in human behavior. Such an agent better, and help us in coping with the increasing chal-
has an internal representation of its external environ- lenges of globalization and the rapid evolution toward
ment which is at their basis of its reasoning abilities. In the knowledge economies.62

REFERENCES
1. Russell SJ, Norvig P. Artificial Intelligence: A Modern 7. Buchanan BG, Shortliffe EH. Rule Based Expert Sys-
Approach. Upper Saddle River, NJ: Prentice-Hall; 2010. tems. Reading, MA: Addison-Wesley; 1984.
2. Tecuci G. Building Intelligent Agents: An Appren- 8. Anderson JR. The Architecture of Cognition.
ticeship Multistrategy Learning Theory, Methodology, Cambridge, MA: Harvard University Press; 1983.
Tool and Case Studies. San Diego, CA: Academic Press; 9. Laird J, Newell A, Rosenbloom PS. SOAR: an archi-
1998. tecture for general intelligence. Artif Intell J 1987,
3. Turing A. Computing machinery and intelligence. Mind 33:164.
1950, 59:433460. 10. Tecuci G. DISCIPLE: a theory, methodology and sys-
4. Samuel AL. Some studies in machine learning using the tem for learning expert knowledge. These de Docteur
game of checkers. IBM J Res Dev 1959, 3:210229. en Science, University of Paris-South, 1988.

5. Robinson JA. A machine-oriented logic based on the 11. Manning C, Schutze H. Foundations of Statistical Nat-
resolution principle. JACM 1965, 12:2341. ural Language Processing. Cambridge, MA: MIT Press;
1999.
6. Buchanan BG, Sutherland GL, Feigenbaum EA. Heuris-
tic DENDRAL: a program for generating explana- 12. Rich E, Knight K. Artificial Intelligence. New York,
tory hypotheses in organic chemistry. In: Meltzer B, NY: McGraw-Hill; 1991.
Michie D, Swan M, eds. Machine Intelligence, 13. Brachman RJ, Levesque HJ. Readings in Knowledge
vol 4. Edinburgh: Edinburgh University Press; 1969, Representation. San Mateo, CA: Morgan Kaufmann;
209254. 1985.

178 2011 Wiley Periodicals, Inc. Volume 4, March/April 2012


WIREs Computational Statistics Artificial intelligence

14. Brachman RJ, Levesque H. Knowledge Representation 34. Mitchell TM. Version spaces: an approach to concept
and Reasoning. San Francisco, CA: Morgan Kaufman; learning. Doctoral Dissertation. Stanford, CA: Stanford
2004. University; 1978.
15. McCarthy J. Programs with common sense. In: Min- 35. Michalski RS. A theory and methodology of inductive
sky ML, ed. Semantic Information Processing. Cam- learning. In: Michalski RS, Carbonell JG, Mitchell TM,
bridge, MA: MIT Press; 1968, 403418. eds. Machine Learning: An Artificial Intelligence
16. Kowalski R. Logic for Problem Solving. Amsterdam, Approach, vol 1. Palo Alto, CA: Tioga Publishing Co;
London, New York: Elsevier, North-Holland; 1979. 1983, 83129.
17. Genesereth MR, Nilsson NJ. Logical Foundations of 36. Quinlan JR. Induction of decision trees. Mach Learn
Artificial Intelligence. San Mateo, CA: Morgan Kauf- 1986, 1:81106.
mann; 1987. 37. Mitchell TM. Machine Learning. Boston, MA:
18. Brownston L, Farrell R, Kant E, Martin N. Program- McGraw-Hill; 1997.
ming Expert Systems in OPS5: An Introduction to Rule- 38. Hearst MA, Dumais ST, Osman E, Platt J, Scholkopf B.
Based Programming. Reading, MA: Addison-Wesley; Support vector machines. IEEE Intell Syst 1998,
1985. 13:1828.
19. Quillian MR. Semantic memory. In: In Minsky M, 39. Mitchell TM, Keller RM, Kedar-Cabelli ST.
ed. Semantic Information Processing. Cambridge, MA: Explanation-based generalization: a unifying view.
MIT Press; 1968, 227270. Mach Learn 1986, 1:4780.
20. Minsky M. A framework for representing knowledge. 40. DeJong G, Mooney R. Explanation-based learning: an
In: Winston PH, ed. The Psychology of Computer alternative view. Mach Learn 1986, 1: 145176.
Vision. New York: McGraw-Hill; 1975, 211277. 41. Winston PH. Learning and reasoning by analogy. Com-
21. Bobrow DG, Winograd T. An overview of KRL, a mun ACM 1980, 23:689703.
knowledge representation language. Cognit Sci 1977, 42. Gentner D. Structure mapping: a theoretical framework
1:346. for analogy. Cognit Sci 1983, 7:155170.
22. Sowa JF. Knowledge Representation: Logical, Philo- 43. Josephson J, Josephson S. Abductive Inference. Cam-
sophical, and Computational Foundations. Pacific bridge University Press; 1994.
Grove, CA: Brooks/Cole; 1984.
44. Fisher DH. Knowledge acquisition via incremental con-
23. Brachman RJ, Schmolze JG. An overview of the KL-
ceptual clustering. Mach Learn 1987, 2:139172.
ONE knowledge representation system. Cognit Sci
1985, 9:171216. 45. Langley P, Simon HA, Bradshow GL, Zytkow JM. Sci-
entific Discovery: Computational Explorations of the
24. Lenat DB, Guha RV. Building Large Knowledge-Based
Creative Processes. Cambridge, MA: MIT Press; 1987.
Systems: Representation and Inference in the CYC
Project. Reading, MA: Addison-Wesley; 1990. 46. Kaelbling LP, Littman ML, Moore AV. Reinforcement
learning: a survey. J AI Res 1996, 4:237285. Avail-
25. Pearl J. Causality: Models, Reasoning, and Inference.
able at: http://www.jair.org/. (Accessed November 11,
New York: Cambridge University Press; 2009.
2011).
26. Cohen LJ. The Probable and the Provable. Oxford:
47. DeJong K. Evolutionary Computation: Theory and
Clarendon Press; 1977.
Practice. MIT Press; 2006.
27. Shafer G. A Mathematical Theory of Evidence. Prince-
48. Tecuci G. Plausible justification trees: a framework for
ton, NJ: Princeton University Press; 1976.
the deep and dynamic integration of learning strategies.
28. Zadeh L. The Role of Fuzzy Logic in the Management Mach Learn J 1993, 11:237261.
of Uncertainty in Expert Systems. Fuzzy Sets Syst 1983,
49. Michalski RS, Tecuci G, eds. Machine Learning: A
11:199227.
Multistrategy Approach. San Mateo, CA: Morgan Kauf-
29. Schum DA. The Evidential Foundations of Probabilis- mann Publishers; 1994.
tic Reasoning. Evanston, IL: Northwestern University
50. Tecuci G, Boicu M, Boicu C, Marcu D, Stanescu B, Bar-
Press; 2001.
bulescu M. The Disciple-RKF learning and reasoning
30. OWL 2 Web Ontology Language. Available at: http:// agent. Comput Intell 2005, 21:462479.
www.w3.org/TR/owl2-overview/ (Accessed August
51. Tecuci G, Boicu M, Cox MT. Seven aspects of mixed-
14, 2011).
initiative reasoning: an introduction to the special issue
31. Campbell MS, Hoane AJ, Hsu F-H. Deep blue Artif on mixed-initiative assistants. AI Mag 2007, 28:1118.
Intell J 2002, 134:5783.
52. Tufis D, Ion R, Ide N. Fine-grained word sense disam-
32. Leake DB. Case-based Reasoning: Experiences, Lessons biguation based on parallel corpora, word alignment,
and Future Directions. MIT Press; 1996. word clustering and aligned wordnets. Proceedings of
33. Nilsson NJ. Problem Solving Methods in Artificial Intel- the 20th International Conference on Computational
ligence. New York: McGraw-Hill; 1971. Linguistics, COLING2004, Geneva; 2004, 13121318.

Volume 4, March/April 2012 2011 Wiley Periodicals, Inc. 179


Overview wires.wiley.com/compstats

53. Tufis D. Language engineering for lesser-studied lan- 58. Forsyth D, Ponce J. Computer Vision: A Modern
guages. In: Nirenburg S, ed. Algorithms and Data Approach. Upper Saddle River, NJ: Prentice Hall; 2002.
Design Issues for Basic NLP Tools. NATO Science 59. Bekey G, Ambrose R, Kumar V, Lavery D, Sander-
for Peace and Security Series - D: Information and son A, Wilcox B, Yuh J, Zheng Y. Robotics: State
Communication Security. IOS Press; 2009, 350. of the Art and Future Challenges. London, England:
54. WordNet: A Lexical Database for English. http:// Imperial College Press; 2008.
wordnetweb.princeton.edu/perl/webwn. (Accessed 60. Turoff M. Design of Interactive Systems. In Emergency
October 1, 2011). Management Information Systems Tutorial.The Hawaii
International Conference on System Sciences, HICSS-
55. Allen J. Natural Language Understanding. Redwood
40, Hawaii; 2007.
City, CA: The Benjamin/Cummings Publishing Com-
pany, Inc; 1995. 61. Phillips-Wren G, Jain LC, Nakamatsu K, Howlett RJ,
eds. Advances in Intelligent Decision Technologies,
56. Jurafsky D, Martin JH. Speech and Language Process- SIST 4. Berlin, Heidelberg: Springer-Verlag; 2010.
ing. Prentice Hall; 2008.
62. Toward Knowledge Societies. Available at: http://
57. Huang XD, Acero A, Hon H. Spoken Language Pro- unesdoc.unesco.org/images/0014/001418/141843e.pdf.
cessing. Upper Saddle River, NJ: Prentice Hall; 2001. (Accessed October 1, 2011).

180 2011 Wiley Periodicals, Inc. Volume 4, March/April 2012

You might also like