Professional Documents
Culture Documents
MACHINE INTELLIGENCE
28 May 2015 / Vol 521 / Issue No 7553
W
Cover illustration
Nik Spencer
Editor, Nature
Philip Campbell
Publishing
Richard Hughes
Production Editor
Jenny Rooke
Art Editor
Nik Spencer
Sponsorship
Reya Silao
Production
Ian Pope
Marketing
Steven Hurst
Editorial Assistants
Rebecca White
Melissa Rose
CONTENTS
REVIEWS
2 8 M AY 2 0 1 5 | VO L 5 2 1 | NAT U R E | 4 3 5
REVIEW
doi:10.1038/nature14539
Deep learning
Yann LeCun1,2, Yoshua Bengio3 & Geoffrey Hinton4,5
Deep learning allows computational models that are composed of multiple processing layers to learn representations of
data with multiple levels of abstraction. These methods have dramatically improved the state-of-the-art in speech recognition, visual object recognition, object detection and many other domains such as drug discovery and genomics. Deep
learning discovers intricate structure in large data sets by using the backpropagation algorithm to indicate how a machine
should change its internal parameters that are used to compute the representation in each layer from the representation in
the previous layer. Deep convolutional nets have brought about breakthroughs in processing images, video, speech and
audio, whereas recurrent nets have shone light on sequential data such as text and speech.
intricate structures in high-dimensional data and is therefore applicable to many domains of science, business and government. In addition
to beating records in image recognition14 and speech recognition57, it
has beaten other machine-learning techniques at predicting the activity of potential drug molecules8, analysing particle accelerator data9,10,
reconstructing brain circuits11, and predicting the effects of mutations
in non-coding DNA on gene expression and disease12,13. Perhaps more
surprisingly, deep learning has produced extremely promising results
for various tasks in natural language understanding14, particularly
topic classification, sentiment analysis, question answering15 and language translation16,17.
We think that deep learning will have many more successes in the
near future because it requires very little engineering by hand, so it
can easily take advantage of increases in the amount of available computation and data. New learning algorithms and architectures that are
currently being developed for deep neural networks will only accelerate this progress.
Supervised learning
The most common form of machine learning, deep or not, is supervised learning. Imagine that we want to build a system that can classify
images as containing, say, a house, a car, a person or a pet. We first
collect a large data set of images of houses, cars, people and pets, each
labelled with its category. During training, the machine is shown an
image and produces an output in the form of a vector of scores, one
for each category. We want the desired category to have the highest
score of all categories, but this is unlikely to happen before training.
We compute an objective function that measures the error (or distance) between the output scores and the desired pattern of scores. The
machine then modifies its internal adjustable parameters to reduce
this error. These adjustable parameters, often called weights, are real
numbers that can be seen as knobs that define the inputoutput function of the machine. In a typical deep-learning system, there may be
hundreds of millions of these adjustable weights, and hundreds of
millions of labelled examples with which to train the machine.
To properly adjust the weight vector, the learning algorithm computes a gradient vector that, for each weight, indicates by what amount
the error would increase or decrease if the weight were increased by a
tiny amount. The weight vector is then adjusted in the opposite direction to the gradient vector.
The objective function, averaged over all the training examples, can
Facebook AI Research, 770 Broadway, New York, New York 10003 USA. 2New York University, 715 Broadway, New York, New York 10003, USA. 3Department of Computer Science and Operations
Research Universit de Montral, Pavillon Andr-Aisenstadt, PO Box 6128 Centre-Ville STN Montral, Quebec H3C 3J7, Canada. 4Google, 1600 Amphitheatre Parkway, Mountain View, California
94043, USA. 5Department of Computer Science, University of Toronto, 6 Kings College Road, Toronto, Ontario M5S 3G4, Canada.
4 3 6 | NAT U R E | VO L 5 2 1 | 2 8 M AY 2 0 1 5
REVIEW INSIGHT
be seen as a kind of hilly landscape in the high-dimensional space of
weight values. The negative gradient vector indicates the direction
of steepest descent in this landscape, taking it closer to a minimum,
where the output error is low on average.
In practice, most practitioners use a procedure called stochastic
gradient descent (SGD). This consists of showing the input vector
for a few examples, computing the outputs and the errors, computing
the average gradient for those examples, and adjusting the weights
accordingly. The process is repeated for many small sets of examples
from the training set until the average of the objective function stops
decreasing. It is called stochastic because each small set of examples
gives a noisy estimate of the average gradient over all examples. This
simple procedure usually finds a good set of weights surprisingly
quickly when compared with far more elaborate optimization techniques18. After training, the performance of the system is measured
on a different set of examples called a test set. This serves to test the
generalization ability of the machine its ability to produce sensible
answers on new inputs that it has never seen during training.
z
z = y y
z
y
y
x
x
Hidden
(2 sigmoid)
Input
(2)
c
l
wkl
z = yz xy x
z z y
x = y x
yl = f (zl )
zl =
y = xy x
Output
(1 sigmoid)
d
Output units
wkl yk
k H2
kl
Hidden units H2
yk = f (zk )
wjk
j
Hidden units H1
wij
Input units
E
=
yk
zk =
wkl
I out
w jk y j
j H1
E
zl
E
E yk
=
zk
yk zk
k
wjk
y j = f (zj )
zj =
j
wij xi
wij
i Input
E
=
yj
w jk
k H2
E
zk
E
E yj
=
zj
y j zj
Figure 1 | Multilayer neural networks and backpropagation. a, A multilayer neural network (shown by the connected dots) can distort the input
space to make the classes of data (examples of which are on the red and
blue lines) linearly separable. Note how a regular grid (shown on the left)
in input space is also transformed (shown in the middle panel) by hidden
units. This is an illustrative example with only two input units, two hidden
units and one output unit, but the networks used for object recognition
or natural language processing contain tens or hundreds of thousands of
units. Reproduced with permission from C. Olah (http://colah.github.io/).
b, The chain rule of derivatives tells us how two small effects (that of a small
change of x on y, and that of y on z) are composed. A small change x in
x gets transformed first into a small change y in y by getting multiplied
by y/x (that is, the definition of partial derivative). Similarly, the change
y creates a change z in z. Substituting one equation into the other
gives the chain rule of derivatives how x gets turned into z through
multiplication by the product of y/x and z/x. It also works when x,
y and z are vectors (and the derivatives are Jacobian matrices). c, The
equations used for computing the forward pass in a neural net with two
hidden layers and one output layer, each constituting a module through
INSIGHT REVIEW
Samoyed (16); Papillon (5.7); Pomeranian (2.7); Arctic fox (1.0); Eskimo dog (0.6); white wolf (0.4); Siberian husky (0.4)
Max pooling
Red
Green
Blue
corresponding to the output for one of the learned features, detected at each
of the image positions. Information flows bottom up, with lower-level features
acting as oriented edge detectors, and a score is computed for each image class
in output. ReLU, rectified linear unit.
raw pixels could not possibly distinguish the latter two, while putting
the former two in the same category. This is why shallow classifiers
require a good feature extractor that solves the selectivityinvariance
dilemma one that produces representations that are selective to
the aspects of the image that are important for discrimination, but
that are invariant to irrelevant aspects such as the pose of the animal.
To make classifiers more powerful, one can use generic non-linear
features, as with kernel methods20, but generic features such as those
arising with the Gaussian kernel do not allow the learner to generalize well far from the training examples21. The conventional option is
to hand design good feature extractors, which requires a considerable amount of engineering skill and domain expertise. But this can
all be avoided if good features can be learned automatically using a
general-purpose learning procedure. This is the key advantage of
deep learning.
A deep-learning architecture is a multilayer stack of simple modules, all (or most) of which are subject to learning, and many of which
compute non-linear inputoutput mappings. Each module in the
stack transforms its input to increase both the selectivity and the
invariance of the representation. With multiple non-linear layers, say
a depth of 5 to 20, a system can implement extremely intricate functions of its inputs that are simultaneously sensitive to minute details
distinguishing Samoyeds from white wolves and insensitive to
large irrelevant variations such as the background, pose, lighting and
surrounding objects.
rule for derivatives. The key insight is that the derivative (or gradient) of the objective with respect to the input of a module can be
computed by working backwards from the gradient with respect to
the output of that module (or the input of the subsequent module)
(Fig.1). The backpropagation equation can be applied repeatedly to
propagate gradients through all modules, starting from the output
at the top (where the network produces its prediction) all the way to
the bottom (where the external input is fed). Once these gradients
have been computed, it is straightforward to compute the gradients
with respect to the weights of each module.
Many applications of deep learning use feedforward neural network architectures (Fig. 1), which learn to map a fixed-size input
(for example, an image) to a fixed-size output (for example, a probability for each of several categories). To go from one layer to the
next, a set of units compute a weighted sum of their inputs from the
previous layer and pass the result through a non-linear function. At
present, the most popular non-linear function is the rectified linear
unit (ReLU), which is simply the half-wave rectifier f(z)=max(z, 0).
In past decades, neural nets used smoother non-linearities, such as
tanh(z) or 1/(1+exp(z)), but the ReLU typically learns much faster
in networks with many layers, allowing training of a deep supervised
network without unsupervised pre-training28. Units that are not in
the input or output layer are conventionally called hidden units. The
hidden layers can be seen as distorting the input in a non-linear way
so that categories become linearly separable by the last layer (Fig.1).
In the late 1990s, neural nets and backpropagation were largely
forsaken by the machine-learning community and ignored by the
computer-vision and speech-recognition communities. It was widely
thought that learning useful, multistage, feature extractors with little prior knowledge was infeasible. In particular, it was commonly
thought that simple gradient descent would get trapped in poor local
minima weight configurations for which no small change would
reduce the average error.
In practice, poor local minima are rarely a problem with large networks. Regardless of the initial conditions, the system nearly always
reaches solutions of very similar quality. Recent theoretical and
empirical results strongly suggest that local minima are not a serious
issue in general. Instead, the landscape is packed with a combinatorially large number of saddle points where the gradient is zero, and
the surface curves up in most dimensions and curves down in the
From the earliest days of pattern recognition22,23, the aim of researchers has been to replace hand-engineered features with trainable
multilayer networks, but despite its simplicity, the solution was not
widely understood until the mid 1980s. As it turns out, multilayer
architectures can be trained by simple stochastic gradient descent.
As long as the modules are relatively smooth functions of their inputs
and of their internal weights, one can compute gradients using the
backpropagation procedure. The idea that this could be done, and
that it worked, was discovered independently by several different
groups during the 1970s and 1980s2427.
The backpropagation procedure to compute the gradient of an
objective function with respect to the weights of a multilayer stack
of modules is nothing more than a practical application of the chain
4 3 8 | NAT U R E | VO L 5 2 1 | 2 8 M AY 2 0 1 5
REVIEW INSIGHT
remainder29,30. The analysis seems to show that saddle points with
only a few downward curving directions are present in very large
numbers, but almost all of them have very similar values of the objective function. Hence, it does not much matter which of these saddle
points the algorithm gets stuck at.
Interest in deep feedforward networks was revived around 2006
(refs3134) by a group of researchers brought together by the Canadian Institute for Advanced Research (CIFAR). The researchers introduced unsupervised learning procedures that could create layers of
feature detectors without requiring labelled data. The objective in
learning each layer of feature detectors was to be able to reconstruct
or model the activities of feature detectors (or raw inputs) in the layer
below. By pre-training several layers of progressively more complex
feature detectors using this reconstruction objective, the weights of a
deep network could be initialized to sensible values. A final layer of
output units could then be added to the top of the network and the
whole deep system could be fine-tuned using standard backpropagation3335. This worked remarkably well for recognizing handwritten
digits or for detecting pedestrians, especially when the amount of
labelled data was very limited36.
The first major application of this pre-training approach was in
speech recognition, and it was made possible by the advent of fast
graphics processing units (GPUs) that were convenient to program37
and allowed researchers to train networks 10 or 20 times faster. In
2009, the approach was used to map short temporal windows of coefficients extracted from a sound wave to a set of probabilities for the
various fragments of speech that might be represented by the frame
in the centre of the window. It achieved record-breaking results on a
standard speech recognition benchmark that used a small vocabulary38 and was quickly developed to give record-breaking results on
a large vocabulary task39. By 2012, versions of the deep net from 2009
were being developed by many of the major speech groups6 and were
already being deployed in Android phones. For smaller data sets,
unsupervised pre-training helps to prevent overfitting40, leading to
significantly better generalization when the number of labelled examples is small, or in a transfer setting where we have lots of examples
for some source tasks but very few for some target tasks. Once deep
learning had been rehabilitated, it turned out that the pre-training
stage was only needed for small data sets.
There was, however, one particular type of deep, feedforward network that was much easier to train and generalized much better than
networks with full connectivity between adjacent layers. This was
the convolutional neural network (ConvNet)41,42. It achieved many
practical successes during the period when neural networks were out
of favour and it has recently been widely adopted by the computervision community.
Since the early 2000s, ConvNets have been applied with great success to
the detection, segmentation and recognition of objects and regions in
images. These were all tasks in which labelled data was relatively abundant, such as traffic sign recognition53, the segmentation of biological
images54 particularly for connectomics55, and the detection of faces,
text, pedestrians and human bodies in natural images36,50,51,5658. A major
recent practical success of ConvNets is face recognition59.
Importantly, images can be labelled at the pixel level, which will have
applications in technology, including autonomous mobile robots and
2 8 M AY 2 0 1 5 | VO L 5 2 1 | NAT U R E | 4 3 9
INSIGHT REVIEW
Language
Generating RNN
Vision
Deep CNN
A group of people
shopping at an outdoor
market.
There are many
vegetables at the
fruit stand.
with permission from ref. 102. When the RNN is given the ability to focus its
attention on a different location in the input image (middle and bottom; the
lighter patches were given more attention) as it generates each word (bold), we
found86 that it exploits this to achieve better translation of images into captions.
Deep-learning theory shows that deep nets have two different exponential advantages over classic learning algorithms that do not use
distributed representations21. Both of these advantages arise from the
power of composition and depend on the underlying data-generating
distribution having an appropriate componential structure40. First,
learning distributed representations enable generalization to new
combinations of the values of learned features beyond those seen
during training (for example, 2n combinations are possible with n
binary features)68,69. Second, composing layers of representation in
a deep net brings the potential for another exponential advantage70
(exponential in the depth).
The hidden layers of a multilayer neural network learn to represent the networks inputs in a way that makes it easy to predict the
target outputs. This is nicely demonstrated by training a multilayer
neural network to predict the next word in a sequence from a local
4 4 0 | NAT U R E | VO L 5 2 1 | 2 8 M AY 2 0 1 5
REVIEW INSIGHT
context of earlier words71. Each word in the context is presented to
the network as a one-of-N vector, that is, one component has a value
of 1 and the rest are0. In the first layer, each word creates a different
pattern of activations, or word vectors (Fig.4). In a language model,
the other layers of the network learn to convert the input word vectors into an output word vector for the predicted next word, which
can be used to predict the probability for any word in the vocabulary
to appear as the next word. The network learns word vectors that
contain many active components each of which can be interpreted
as a separate feature of the word, as was first demonstrated27 in the
context of learning distributed representations for symbols. These
semantic features were not explicitly present in the input. They were
discovered by the learning procedure as a good way of factorizing
the structured relationships between the input and output symbols
into multiple micro-rules. Learning word vectors turned out to also
work very well when the word sequences come from a large corpus
of real text and the individual micro-rules are unreliable71. When
trained to predict the next word in a news story, for example, the
learned word vectors for Tuesday and Wednesday are very similar, as
are the word vectors for Sweden and Norway. Such representations
are called distributed representations because their elements (the
features) are not mutually exclusive and their many configurations
correspond to the variations seen in the observed data. These word
vectors are composed of learned features that were not determined
ahead of time by experts, but automatically discovered by the neural
network. Vector representations of words learned from text are now
very widely used in natural language applications14,17,7276.
The issue of representation lies at the heart of the debate between
the logic-inspired and the neural-network-inspired paradigms for
cognition. In the logic-inspired paradigm, an instance of a symbol is
something for which the only property is that it is either identical or
non-identical to other symbol instances. It has no internal structure
that is relevant to its use; and to reason with symbols, they must be
bound to the variables in judiciously chosen rules of inference. By
contrast, neural networks just use big activity vectors, big weight
matrices and scalar non-linearities to perform the type of fast intuitive inference that underpins effortless commonsense reasoning.
Before the introduction of neural language models71, the standard
approach to statistical modelling of language did not exploit distributed representations: it was based on counting frequencies of occurrences of short symbol sequences of length up to N (called N-grams).
The number of possible N-grams is on the order of VN, where V is
the vocabulary size, so taking into account a context of more than a
14
13.5
13
body
12
Association
organizations
When backpropagation was first introduced, its most exciting use was
for training recurrent neural networks (RNNs). For tasks that involve
sequential inputs, such as speech and language, it is often better to
use RNNs (Fig. 5). RNNs process an input sequence one element at a
time, maintaining in their hidden units a state vector that implicitly
contains information about the history of all the past elements of
the sequence. When we consider the outputs of the hidden units at
different discrete time steps as if they were the outputs of different
neurons in a deep multilayer network (Fig.5, right), it becomes clear
how we can apply backpropagation to train RNNs.
RNNs are very powerful dynamic systems, but training them has
proved to be problematic because the backpropagated gradients
either grow or shrink at each time step, so over many time steps they
typically explode or vanish77,78.
Thanks to advances in their architecture79,80 and ways of training
them81,82, RNNs have been found to be very good at predicting the
next character in the text83 or the next word in a sequence75, but they
can also be used for more complex tasks. For example, after reading
an English sentence one word at a time, an English encoder network
can be trained so that the final state vector of its hidden units is a good
representation of the thought expressed by the sentence. This thought
vector can then be used as the initial hidden state of (or as extra input
to) a jointly trained French decoder network, which outputs a probability distribution for the first word of the French translation. If a
particular first word is chosen from this distribution and provided
as input to the decoder network it will then output a probability distribution for the second word of the translation and so on until a
full stop is chosen17,72,76. Overall, this process generates sequences of
French words according to a probability distribution that depends on
the English sentence. This rather naive way of performing machine
translation has quickly become competitive with the state-of-the-art,
and this raises serious doubts about whether understanding a sentence requires anything like the internal symbolic expressions that are
manipulated by using inference rules. It is more compatible with the
view that everyday reasoning involves many simultaneous analogies
agencies
organization
12.5
2.2
" the two groups
2.4
of the two groups
office
Agency
school
schools
2.8
institutions
dispute
3 between the two
11.5
3.2
11
In a few months
that a few days
a few months ago
within a few months
3.4
10.5
3.6
10
community
communities
Community
9
37
companies
36
35
34
3.8
society
company
industry
33
32
4
31
30
29
4.2
5.5
4.5
3.5
INSIGHT REVIEW
o
V
s
ot1
W
W
Unfold
U
x
ot
V
st1
ot+1
V
st
st+1
xt1
xt
xt+1
4 4 2 | NAT U R E | VO L 5 2 1 | 2 8 M AY 2 0 1 5
REVIEW INSIGHT
12. Leung, M. K., Xiong, H. Y., Lee, L. J. & Frey, B. J. Deep learning of the tissueregulated splicing code. Bioinformatics 30, i121i129 (2014).
13. Xiong, H. Y. et al. The human splicing code reveals new insights into the genetic
determinants of disease. Science 347, 6218 (2015).
14. Collobert, R., et al. Natural language processing (almost) from scratch. J. Mach.
Learn. Res. 12, 24932537 (2011).
15. Bordes, A., Chopra, S. & Weston, J. Question answering with subgraph
embeddings. In Proc. Empirical Methods in Natural Language Processing http://
arxiv.org/abs/1406.3676v3 (2014).
16. Jean, S., Cho, K., Memisevic, R. & Bengio, Y. On using very large target
vocabulary for neural machine translation. In Proc. ACL-IJCNLP http://arxiv.org/
abs/1412.2007 (2015).
17. Sutskever, I. Vinyals, O. & Le. Q. V. Sequence to sequence learning with neural
networks. In Proc. Advances in Neural Information Processing Systems 27
31043112 (2014).
This paper showed state-of-the-art machine translation results with the
architecture introduced in ref. 72, with a recurrent network trained to read a
sentence in one language, produce a semantic representation of its meaning,
and generate a translation in another language.
18. Bottou, L. & Bousquet, O. The tradeoffs of large scale learning. In Proc. Advances
in Neural Information Processing Systems 20 161168 (2007).
19. Duda, R. O. & Hart, P. E. Pattern Classication and Scene Analysis (Wiley, 1973).
20. Schlkopf, B. & Smola, A. Learning with Kernels (MIT Press, 2002).
21. Bengio, Y., Delalleau, O. & Le Roux, N. The curse of highly variable functions
for local kernel machines. In Proc. Advances in Neural Information Processing
Systems 18 107114 (2005).
22. Selfridge, O. G. Pandemonium: a paradigm for learning in mechanisation of
thought processes. In Proc. Symposium on Mechanisation of Thought Processes
513526 (1958).
23. Rosenblatt, F. The Perceptron A Perceiving and Recognizing Automaton. Tech.
Rep. 85-460-1 (Cornell Aeronautical Laboratory, 1957).
24. Werbos, P. Beyond Regression: New Tools for Prediction and Analysis in the
Behavioral Sciences. PhD thesis, Harvard Univ. (1974).
25. Parker, D. B. Learning Logic Report TR47 (MIT Press, 1985).
26. LeCun, Y. Une procdure dapprentissage pour Rseau seuil assymtrique
in Cognitiva 85: a la Frontire de lIntelligence Articielle, des Sciences de la
Connaissance et des Neurosciences [in French] 599604 (1985).
27. Rumelhart, D. E., Hinton, G. E. & Williams, R. J. Learning representations by
back-propagating errors. Nature 323, 533536 (1986).
28. Glorot, X., Bordes, A. & Bengio. Y. Deep sparse rectier neural networks. In Proc.
14th International Conference on Artificial Intelligence and Statistics 315323
(2011).
This paper showed that supervised training of very deep neural networks is
much faster if the hidden layers are composed of ReLU.
29. Dauphin, Y. et al. Identifying and attacking the saddle point problem in highdimensional non-convex optimization. In Proc. Advances in Neural Information
Processing Systems 27 29332941 (2014).
30. Choromanska, A., Henaff, M., Mathieu, M., Arous, G. B. & LeCun, Y. The loss
surface of multilayer networks. In Proc. Conference on AI and Statistics http://
arxiv.org/abs/1412.0233 (2014).
31. Hinton, G. E. What kind of graphical model is the brain? In Proc. 19th
International Joint Conference on Artificial intelligence 17651775 (2005).
32. Hinton, G. E., Osindero, S. & Teh, Y.-W. A fast learning algorithm for deep belief
nets. Neural Comp. 18, 15271554 (2006).
This paper introduced a novel and effective way of training very deep neural
networks by pre-training one hidden layer at a time using the unsupervised
learning procedure for restricted Boltzmann machines.
33. Bengio, Y., Lamblin, P., Popovici, D. & Larochelle, H. Greedy layer-wise training
of deep networks. In Proc. Advances in Neural Information Processing Systems 19
153160 (2006).
This report demonstrated that the unsupervised pre-training method
introduced in ref. 32 significantly improves performance on test data and
generalizes the method to other unsupervised representation-learning
techniques, such as auto-encoders.
34. Ranzato, M., Poultney, C., Chopra, S. & LeCun, Y. Efcient learning of sparse
representations with an energy-based model. In Proc. Advances in Neural
Information Processing Systems 19 11371144 (2006).
35. Hinton, G. E. & Salakhutdinov, R. Reducing the dimensionality of data with
neural networks. Science 313, 504507 (2006).
36. Sermanet, P., Kavukcuoglu, K., Chintala, S. & LeCun, Y. Pedestrian detection with
unsupervised multi-stage feature learning. In Proc. International Conference
on Computer Vision and Pattern Recognition http://arxiv.org/abs/1212.0142
(2013).
37. Raina, R., Madhavan, A. & Ng, A. Y. Large-scale deep unsupervised learning
using graphics processors. In Proc. 26th Annual International Conference on
Machine Learning 873880 (2009).
38. Mohamed, A.-R., Dahl, G. E. & Hinton, G. Acoustic modeling using deep belief
networks. IEEE Trans. Audio Speech Lang. Process. 20, 1422 (2012).
39. Dahl, G. E., Yu, D., Deng, L. & Acero, A. Context-dependent pre-trained deep
neural networks for large vocabulary speech recognition. IEEE Trans. Audio
Speech Lang. Process. 20, 3342 (2012).
40. Bengio, Y., Courville, A. & Vincent, P. Representation learning: a review and new
perspectives. IEEE Trans. Pattern Anal. Machine Intell. 35, 17981828 (2013).
41. LeCun, Y. et al. Handwritten digit recognition with a back-propagation network.
In Proc. Advances in Neural Information Processing Systems 396404 (1990).
This is the first paper on convolutional networks trained by backpropagation
INSIGHT REVIEW
73.
74.
75.
76.
77.
78.
79.
80.
81.
82.
83.
84.
85.
86.
87.
88.
89.
90. Weston, J., Bordes, A., Chopra, S. & Mikolov, T. Towards AI-complete question
answering: a set of prerequisite toy tasks. http://arxiv.org/abs/1502.05698
(2015).
91. Hinton, G. E., Dayan, P., Frey, B. J. & Neal, R. M. The wake-sleep algorithm for
unsupervised neural networks. Science 268, 15581161 (1995).
92. Salakhutdinov, R. & Hinton, G. Deep Boltzmann machines. In Proc. International
Conference on Artificial Intelligence and Statistics 448455 (2009).
93. Vincent, P., Larochelle, H., Bengio, Y. & Manzagol, P.-A. Extracting and composing
robust features with denoising autoencoders. In Proc. 25th International
Conference on Machine Learning 10961103 (2008).
94. Kavukcuoglu, K. et al. Learning convolutional feature hierarchies for visual
recognition. In Proc. Advances in Neural Information Processing Systems 23
10901098 (2010).
95. Gregor, K. & LeCun, Y. Learning fast approximations of sparse coding. In Proc.
International Conference on Machine Learning 399406 (2010).
96. Ranzato, M., Mnih, V., Susskind, J. M. & Hinton, G. E. Modeling natural images
using gated MRFs. IEEE Trans. Pattern Anal. Machine Intell. 35, 22062222
(2013).
97. Bengio, Y., Thibodeau-Laufer, E., Alain, G. & Yosinski, J. Deep generative
stochastic networks trainable by backprop. In Proc. 31st International
Conference on Machine Learning 226234 (2014).
98. Kingma, D., Rezende, D., Mohamed, S. & Welling, M. Semi-supervised learning
with deep generative models. In Proc. Advances in Neural Information Processing
Systems 27 35813589 (2014).
99. Ba, J., Mnih, V. & Kavukcuoglu, K. Multiple object recognition with visual
attention. In Proc. International Conference on Learning Representations http://
arxiv.org/abs/1412.7755 (2014).
100. Mnih, V. et al. Human-level control through deep reinforcement learning. Nature
518, 529533 (2015).
101. Bottou, L. From machine learning to machine reasoning. Mach. Learn. 94,
133149 (2014).
102. Vinyals, O., Toshev, A., Bengio, S. & Erhan, D. Show and tell: a neural image
caption generator. In Proc. International Conference on Machine Learning http://
arxiv.org/abs/1502.03044 (2014).
103. van der Maaten, L. & Hinton, G. E. Visualizing data using t-SNE. J. Mach. Learn.
Research 9, 25792605 (2008).
Acknowledgements The authors would like to thank the Natural Sciences and
Engineering Research Council of Canada, the Canadian Institute For Advanced
Research (CIFAR), the National Science Foundation and Office of Naval Research
for support. Y.L. and Y.B. are CIFAR fellows.
Author Information Reprints and permissions information is available at
www.nature.com/reprints. The authors declare no competing financial
interests. Readers are welcome to comment on the online version of this
paper at go.nature.com/7cjbaa. Correspondence should be addressed to Y.L.
(yann@cs.nyu.edu).
4 4 4 | NAT U R E | VO L 5 2 1 | 2 8 M AY 2 0 1 5
REVIEW
doi:10.1038/nature14540
Reinforcement learning is a branch of machine learning concerned with using experience gained through interacting
with the world and evaluative feedback to improve a systems ability to make behavioural decisions. It has been called the
artificial intelligence problem in a microcosm because learning algorithms must act autonomously to perform well and
achieve their goals. Partly driven by the increasing availability of rich data, recent years have seen exciting advances in
the theory and practice of reinforcement learning, including developments in fundamental technical areas such as generalization, planning, exploration and empirical methodology, leading to increasing applicability to real-life problems.
einforcement-learning algorithms1,2 are inspired by our understanding of decision making in humans and other animals in
which learning is supervised through the use of reward signals
in response to the observed outcomes of actions. As our understanding
of this class of problems improves, so does our ability to bring it to bear
in practical settings. Reinforcement learning is having a considerable
impact on nuts-and-bolts questions such as how to create more effective personalized Web experiences, as well as esoteric questions such
as how to design better computerized players for the traditional board
game Go or 1980s video games. Reinforcement learning is also providing a valuable conceptual framework for work in psychology, cognitive
science, behavioural economics and neuroscience that seeks to explain
the process of decision making in the natural world.
One way to think about machine learning is as a set of techniques
to try to answer the question: when I am in situation x, what response
should I choose? As a concrete example, consider the problem of assigning patrons to tables in a restaurant. Parties of varying sizes arrive in
an unknown order and the host allocates each party to a table. The
host maps the current situation x (size of the latest party and information about which tables are occupied and for how long) to a decision
(which table to assign to the party), trying to satisfy a set of competing
goals such as minimizing the waiting time for a table, balancing the load
among the various servers, and ensuring that the members of a party
can sit together. Similar allocation challenges come up in games such
as Tetris and computational problems such as data-centre task allocation. Viewed more broadly, this framework fits any problem in which a
sequence of decisions needs to be made to maximize a scoring function
over uncertain outcomes.
The kinds of approaches needed to learn good behaviour for the
patron-assignment problem depend on what kinds of feedback information are available to the decision maker while learning (Fig.1).
Exhaustive versus sampled feedback is concerned with the coverage of the training examples. A learner given exhaustive feedback is
exposed to all possible situations. Sampled feedback is weaker, in that
the learner is only provided with experience of a subset of situations.
The central problem in classic supervised learning is generalizing from
sampled examples.
Supervised versus evaluative feedback is concerned with how the
learner is informed of right and wrong answers. A requirement for
applying supervised learning methods is the availability of examples
with known optimal decisions. In the patron-assignment problem, a
host-in-training could work as an apprentice to a much more experienced supervisor to learn how to handle a range of situations. If the
apprentice can only learn from supervised feedback, however, she would
have no opportunities to improve after the apprenticeship ends.
By contrast, evaluative feedback provides the learner with an assessment of the effectiveness of the decisions that she made; no information
is available on the appropriateness of alternatives. For example, a host
might learn about the ability of a server to handle unruly patrons by trial
and error: when the host makes an assignment of a difficult customer,
it is possible to tell whether things went smoothly with the selected
server, but no direct information is available as to whether one of the
other servers might have been a better choice. The central problem in
the field of reinforcement learning is addressing the challenge of evaluative feedback.
One-shot versus sequential feedback is concerned with the relative timing of learning signals. Evaluative feedback can be subdivided
into whether it is provided directly for each decision or whether it has
longer-term impacts that are evaluated over a sequence of decisions.
For example, if a host needs to seat a party of 12 and there are no large
tables available, some past decision to seat a small party at a big table
might be to blame. Reinforcement learners must solve this temporal
credit assignment problem to be able to derive good behaviour in the
face of this weak sequential feedback.
From the beginning, reinforcement-learning methods have contended with all three forms of weak feedback simultaneously sampled, evaluative and sequential feedback. As such, the problem is
considerably harder than that of supervised learning. However, methods
that can learn from weak sources of feedback are more generally applicable and can be mapped to a variety of naturally occurring problems,
as suggested by the patron-assignment problem.
The following sections describe recent advances in several sub-areas
of reinforcement learning that are expanding its power and applicability.
Bandit problems
Department of Computer Science, Brown University, Providence, Rhode Island 02912, USA.
2 8 M AY 2 0 1 5 | VO L 5 2 1 | NAT U R E | 4 4 5
INSIGHT REVIEW
Bandits
Tabular
reinforcement learning
Reinforcement
learning
Sequential
versus
one-shot
Contextual
bandits
Supervised
machine
learning
Sampled
versus
exhaustive
Evaluative
versus
supervised
4 4 6 | NAT U R E | VO L 5 2 1 | 2 8 M AY 2 0 1 5
REVIEW INSIGHT
represents the feedback received from the environment for the transition to x. Classic temporal difference methods set about minimizing
the difference between these quantities.
Reinforcement learning in the face of evaluative, sampled and sequential feedback requires combining methods for temporal credit assignment with methods for generalization. Concretely, if the V values are
represented with a neural network or similar representation, finding
predictions such that V(x)Ex[r(x)+V(x)] can lead to instability and
divergence in the learning process14,15.
Recent work16 has modified the goal of learning slightly by incorporating the generalization process into the learning objective
itself. Specifically, consider the goal of seeking V values such that
V(x)Ex[r(x)+V(x)] where is the projection of the values resulting
from their representation by the generalization method. When feedback
is exhaustive and no generalization is used, is the identity function
and this goal is no different from classic temporal difference learning.
However, when linear functions are used to represent values, this modified goal can be used to create provably convergent learning algorithms17.
This insight was rapidly generalized to non-linear function approximators that are smooth (locally linear)18, to the control setting19, and to
the use of eligibility traces in the learning process20. This line of work
holds a great deal of promise for creating reliable methods for learning
effective behaviour from very weak feedback.
Planning
X
O
O
X
O
O
O O
X
O O
X
BOX 1
State
Action
Reward
R(s, a)
State
*(s)
Action
Reward
State
Action
P(sls, a)
2 8 M AY 2 0 1 5 | VO L 5 2 1 | NAT U R E | 4 4 7
INSIGHT REVIEW
Evaluation-function methods
Full tree
whether its moves will lead to a win, a loss or a tie from each. In a game
such as chess, however, the size of the game tree is astronomical and the
best an algorithm can hope for is to assemble a representative sample of
boards to consider. The classic approach for dealing with this problem
is to use an evaluation function. Programs search as deeply in the game
tree as they can, but if they cannot reach the end of the game where the
outcome is known, a heuristic evaluation function is used to estimate
how the game will end. Even a moderately good guess can lead to excellent game play in this context evaluation-function methods have
been used in chess from the earliest days of artificial intelligence to the
decision making in the Deep Blue system that beat the World Chess
Champion Garry Kasparov22.
A powerful variation of this idea is to use machine learning to
improve the evaluation function over time. Here, temporal difference
To plan, a learning algorithm needs to be able to predict the immediate outcomes of its actions. Since it can try an action and observe
the effects right away, this part of the problem can be addressed using
supervised learning methods. In model-based reinforcement-learning
approaches, experiences are used to estimate a transition model, mapping from situations and decisions to resulting situations, and then a
planning approach is used to make decisions that maximize predicted
long-term outcomes.
Model-based reinforcement-learning methods and model-free methods such as temporal differences strike a different balance between computational expense and experience efficiency to a first approximation,
model-free methods are inexpensive, but slow to learn; whereas modelbased methods are computationally intensive, but squeeze the most out
of each experience (Box 2). In applications such as learning to manage
memory resources in a central processing unit, decisions need to be
made at the speed of a computers clock and data are abundant, making
temporal difference learning an excellent fit28. In applications such as a
robot helicopter learning acrobatic tricks, experience is slow to collect
and there is ample time offline to compute decision policies, making a
model-based approach appropriate29. There are also methods that draw
on both styles of learning30 to try to get the complementary advantages.
Model-based algorithms can make use of insights from supervised
learning to deal with sampled feedback when learning their transition
BOX 2
4 4 8 | NAT U R E | VO L 5 2 1 | 2 8 M AY 2 0 1 5
REVIEW INSIGHT
BOX 3
Empirical methods
An important turning point in the development of practical supervised learning methods was a shift from artificial learning problems
to learning problems grounded in measured data and careful experimental design34. The reinforcement-learning community is showing
signs of a similar evolution, which promises to help to solidify the
technical gains described in the previous sections. Of particular interest are new evaluation methodologies that go beyond case studies of
individual algorithm learning in individual environments to controlled studies of different learning algorithms and their performance
across multiple natural domains. In supervised learning, data-set collections such as the University of California, Irvine Machine Learning
Repository (http://archive.ics.uci.edu/ml/) and the Weka Machine
Learning library (http://www.cs.waikato.ac.nz/ml/weka/) facilitate
this kind of research by providing data for learning problems and a
suite of algorithms in a modular form. A challenge for the study of
reinforcement learning, however, is that learning takes place through
active exploration between the learner and her environment. Environments are almost always represented by simulators, which can be
difficult to standardize and share, and are rarely realistic stand-ins
for the actual environment of interest.
One noteworthy exception is the Arcade Learning Environment35,
which provides a reinforcement-leaning-friendly interface for an
eclectic collection of more than 50 Atari 2600 video games. Although
video games are by no means the only domain of interest, the diversity of game types and the unity of the interface makes the Arcade
Learning Environment an exciting platform for the empirical study of
machine reinforcement learning. An extremely impressive example of
Output f(x)
Yes
Yes
15
20
I dont know
No
No
24
Yes
25
No
INSIGHT REVIEW
a good choice. Longer learning periods and contextual decisions make
the number of possibilities to consider astronomically high. Despite
this daunting combinatorial explosion, offline evaluation procedures
for contextual bandit problems have been proposed38. The key idea is to
collect a static (but enormous) set of contextual bandit decisions by running a uniform random selection algorithm on a real problem (a news
article recommendation, for example). Then, to test a bandit algorithm
using this collection, present the algorithm with the same series of contexts. Whenever the selection made by the algorithm does not match
the selection made during data collection, the algorithm is forced to
forget what it saw. Thus, as far as the algorithm knows, all of its choices
during learning matched the choices made during data collection. The
end result is an unbiased evaluation of a dynamic bandit algorithm using
static pre-recorded data.
One reason that this approach works for contextual bandit problems
is that apart from the state of the learning algorithm there are no
dependencies from one round of decision making to the next. So, skipping many rounds because the algorithm and the data collection do not
match does not change the fundamental character of the learning problem. In the presence of sequential feedback, however, this trick no longer
applies. Proposals have been made for creating reinforcement-learning
data repositories39 and for evaluating policies using observational data
sets40, but no solid solution for evaluating general learning algorithms
has yet emerged.
Finding appropriate reward functions
Reinforcement-learning systems strive to identify behaviour that maximizes expected total reward. There is a sense, therefore, that they are
programmable through their reward functions the mappings from
situation to score. A helpful analogy to consider is between learning
and program interpretation in digital computers: a program (reward
function) directs an interpreter (learning algorithm) to process inputs
(environments) into desirable outputs (behaviour). The vast majority
of research in reinforcement learning has been into the development
of effective interpreters with little attention paid to how we should be
programming them (providing reward functions).
There are a few approaches that have been explored for producing
reward functions that induce a target behaviour given some conception
of that behaviour. As a simple example, if one reward function for the
target behaviour is known, the space of behaviour-preserving transformations to this reward function is well understood41; other, essentially
equivalent, reward functions can be created. If a human teacher is available who can provide evaluative feedback, modifications to intended
target reinforcement-learning algorithms can be used to search for the
target behaviour4244. The problem of inverse reinforcement learning45,46
addresses the challenge of creating appropriate reward functions given
the availability of behavioural traces from an expert executing the target
behaviour. It is called inverse reinforcement learning because the learner
is given behaviour and needs to generate a reward function instead of
the other way around. These methods infer a reward function that the
demonstrated behaviour optimizes. One can turn to evolutionary optimization47 to generate reward functions if there is an effective way to
evaluate the appropriateness of the resulting behaviour. One advantage of
the evolutionary approach is that, among the many possible reward functions that generate good behaviour, it can identify ones that provide helpful but not too distracting hints that can speed up the learning process.
Looking forward, new techniques for specifying complex behaviour and
translations of these specifications into appropriate reward functions are
essential. Existing reward-function specifications lack the fundamental
ideas that enable the design of todays massive software systems, such as
abstraction, modularity and encapsulation. Analogues of these ideas could
greatly extend the practical utility of reinforcement-learning systems.
Reinforcement learning as a cognitive model
The term reinforcement learning itself originated in the animallearning community. It had long been observed that certain stimuli,
4 5 0 | NAT U R E | VO L 5 2 1 | 2 8 M AY 2 0 1 5
REVIEW INSIGHT
11. Bubeck, S. & Liu, C.-Y. Prior-free and prior-dependent regret bounds for
Thompson sampling. In Proc. Advances in Neural Information Processing Systems
638646 (2013).
12. Gershman, S. & Blei, D. A tutorial on Bayesian nonparametric models. J. Math.
Psychol. 56, 112 (2012).
13. Sutton, R. S. Learning to predict by the method of temporal differences. Mach.
Learn. 3, 944 (1988).
14. Boyan, J. A. & Moore, A. W. Generalization in reinforcement learning: safely
approximating the value function. In Proc. Advances in Neural Information
Processing Systems 369 376 (1995).
15. Baird, L. Residual algorithms: reinforcement learning with function
approximation. In Proc. 12th International Conference on Machine Learning (eds
Prieditis, A. & Russell, S.) 3037 (Morgan Kaufmann, 1995).
16. Sutton, R. S. et al. Fast gradient-descent methods for temporal-difference
learning with linear function approximation. In Proc. 26th Annual International
Conference on Machine Learning 9931000 (2009).
17. Sutton, R. S., Maei, H. R. & Szepesvri, C. A convergent O(n) temporal-difference
algorithm for off-policy learning with linear function approximation. In Proc.
Advances in Neural Information Processing Systems 16091616 (2009).
18. Maei, H. R. et al. Convergent temporal-difference learning with arbitrary smooth
function approximation. In Proc. Advances in Neural Information Processing
Systems 12041212 (2009).
19. Maei, H. R., Szepesvri, C., Bhatnagar, S. & Sutton, R. S. Toward off-policy
learning control with function approximation. In Proc. 27th International
Conference on Machine Learning 719726 (2010).
20. van Hasselt, H., Mahmood, A. R. & Sutton, R. S. Off-policy TD() with a true
online equivalence. In Proc. 30th Conference on Uncertainty in Artificial
Intelligence 324 (2014).
21. Russell, S. J. & Norvig, P. Artificial Intelligence: A Modern Approach (Prentice
Hall, 1994).
22. Campbell, M., Hoane, A. J. & Hsu, F. H. Deep blue. Artif. Intell. 134, 5783
(2002).
23. Samuel, A. L. Some studies in machine learning using the game of checkers.
IBM J. Res. Develop. 3, 211229 (1959).
24. Tesauro, G. TD-Gammon, a self-teaching backgammon program, achieves
master-level play. Neural Comput. 6, 215219 (1994).
This article describes the first reinforcement-learning system to solve a truly
non-trivial task.
25. Tesauro, G., Gondek, D., Lenchner, J., Fan, J. & Prager, J. M. Simulation, learning,
and optimization techniques in Watsons game strategies. IBM J. Res. Develop.
56, 111 (2012).
26. Kocsis, L. & Szepesvri, C. Bandit based Monte-Carlo planning. In Proc. 17th
European Conference on Machine Learning 282293 (2006).
This article introduces UCT, the decision-making algorithm that
revolutionized gameplay in Go.
27. Gelly, S. et al. The grand challenge of computer Go: Monte Carlo tree search
and extensions. Communications of the ACM 55, 106113 (2012).
28. pek. E., Mutlu, O., Martnez, J. F. & Caruana, R. Self-optimizing memory
controllers: a reinforcement learning approach. In Proc. 35th International
Symposium on Computer Architecture 3950 (2008).
29. Ng, A. Y., Kim, H. J., Jordan, M. I. & Sastry, S. Autonomous helicopter flight via
reinforcement learning. In Proc. Advances in Neural Information Processing
Systems http://papers.nips.cc/paper/2455-autonomous-helicopter-flight-viareinforcement-learning (2003).
30. Sutton, R. S. Integrated architectures for learning, planning, and reacting based
on approximating dynamic programming. In Proc. 7th International Conference
on Machine Learning 216224 (Morgan Kaufmann, 1990).
31. Kearns, M. J. & Singh, S. P. Near-optimal reinforcement learning in polynomial
time. Mach. Learn. 49, 209232 (2002).
This article provides the first algorithm and analysis that shows that
reinforcement-learning tasks can be solved approximately optimally with a
relatively small amount of experience.
32. Brafman, R. I. & Tennenholtz, M. R-MAX a general polynomial time algorithm
for near-optimal reinforcement learning. J. Mach. Learn. Res. 3, 213231
(2002).
33. Li, L., Littman, M. L., Walsh, T. J. & Strehl, A. L. Knows what it knows: a framework
for self-aware learning. Mach. Learn. 82, 399443 (2011).
34. Langley, P. Machine learning as an experimental science. Mach. Learn. 3, 58
(1988).
35. Bellemare, M. G., Naddaf, Y., Veness, J. & Bowling, M. The arcade learning
environment: an evaluation platform for general agents. J. Artif. Intell. Res. 47,
253279 (2013).
36. Mnih, V. et al. Human-level control through deep reinforcement learning. Nature
518, 529533 (2015).
37.
38.
39.
40.
41.
42.
43.
44.
45.
46.
47.
48.
49.
50.
51.
52.
53.
54.
55.
56.
57.
58.
59.
60.
This article describes the application of deep learning in a reinforcementlearning setting to address the challenging task of decision making in an
arcade environment.
Murphy, S. A. An experimental design for the development of adaptive
treatment strategies. Stat. Med. 24, 14551481 (2005).
Li, L., Chu, W., Langford, J. & Wang, X. Unbiased offline evaluation of contextualbandit-based news article recommendation algorithms. In Proc. 4th ACM
International Conference on Web Search and Data Mining 297306 (2011).
Nouri, A. et al. A novel benchmark methodology and data repository for real-life
reinforcement learning. In Proc. Multidisciplinary Symposium on Reinforcement
Learning, Poster (2009).
Marivate, V. N., Chemali, J., Littman, M. & Brunskill, E. Discovering multimodal characteristics in observational clinical data. In Proc. Machine Learning
for Clinical Data Analysis and Healthcare NIPS Workshop http://paul.rutgers.
edu/~vukosi/papers/nips2013workshop.pdf (2013).
Ng, A. Y., Harada, D. & Russell, S. Policy invariance under reward
transformations: theory and application to reward shaping. In Proc. 16th
International Conference on Machine Learning 278287 (1999).
Thomaz, A. L. & Breazeal, C. Teachable robots: understanding human teaching
behaviour to build more effective robot learners. Artif. Intell. 172, 716737
(2008).
Knox, W. B. & Stone, P. Interactively shaping agents via human reinforcement:
The TAMER framework. In Proc. 5th International Conference on Knowledge
Capture 916 (2009).
Loftin, R. et al. A strategy-aware technique for learning behaviors from discrete
human feedback. In Proc. 28th Association for the Advancement of Artificial
Intelligence Conference https://www.aaai.org/ocs/index.php/AAAI/AAAI14/
paper/view/8579 (2014).
Ng, A. Y. & Russell, S. Algorithms for inverse reinforcement learning. In Proc.
International Conference on Machine Learning 663670 (2000).
Babes, M., Marivate, V. N., Littman, M. L. & Subramanian, K. Apprenticeship
learning about multiple intentions. In Proc. International Conference on Machine
Learning 897904 (2011).
Singh,S., Lewis, R.L., Barto, A.G. & Sorg, J. Intrinsically motivated reinforcement
learning: an evolutionary perspective. IEEE Trans. Auto. Mental Dev. 2, 7082 (2010).
Newell, A. The chess machine: an example of dealing with a complex task by
adaptation. In Proc. Western Joint Computer Conference 101108 (1955).
Minsky, M. L. Some methods of artificial intelligence and heuristic
programming. In Proc. Symposium on the Mechanization of Thought Processes
2427 (1958).
Sutton, R. S. & Barto, A. G. Toward a modern theory of adaptive networks:
expectation and prediction. Psychol. Rev. 88, 135170 (1981).
Schultz, W., Dayan, P. & Montague, P. R. A neural substrate of prediction and
reward. Science 275, 15931599 (1997).
Dayan, P. & Niv, Y. Reinforcement learning and the brain: the good, the bad and
the ugly. Curr. Opin. Neurobiol. 18, 185196 (2008).
Niv, Y. Neuroscience: dopamine ramps up. Nature 500, 533535 (2013).
Cushman, F. Action, outcome, and value a dual-system framework for morality.
Pers. Soc. Psychol. Rev. 17, 273292 (2013).
Shapley, L. Stochastic games. Proc. Natl Acad. Sci. USA 39, 10951100 (1953).
Bellman, R. Dynamic Programming (Princeton Univ. Press, 1957).
Kober, J., Bagnell, J. A. & Peters, J. Reinforcement learning in robotics: a survey.
Int. J. Rob. Res. 32, 12381274 (2013).
Watkins, C. J. C. H. & Dayan, P. Q-learning. Mach. Learn. 8, 279292 (1992).
This article introduces the first provably correct approach to reinforcement
learning for both prediction and decision making.
Jaakkola, T., Jordan, M. I. & Singh, S. P. Convergence of stochastic iterative
dynamic programming algorithms. In Advances in Neural Information Processing
Systems 6, 703710 (Morgan Kaufmann, 1994).
Diuk, C., Li, L. & Leffner, B. R. The adaptive k-meteorologists problem and
its application to structure learning and feature selection in reinforcement
learning. In Proc. 26th International Conference on Machine Learning 3240
(2009).
Acknowledgements
The author appreciates his discussions with his colleagues that led to this synthesis
of current work.
Author Information Reprints and permissions information is available at
www.nature.com/reprints. The author declares no competing financial
interests. Readers are welcome to comment on the online version of this paper
at go.nature.com/csIqwl. Correspondence should be addressed to M.L.L.
(mlittman@cs.brown.edu).
2 8 M AY 2 0 1 5 | VO L 5 2 1 | NAT U R E | 4 5 1
REVIEW
doi:10.1038/nature14541
How can a machine learn from experience? Probabilistic modelling provides a framework for understanding what learning is, and has therefore emerged as one of the principal theoretical and practical approaches for designing machines
that learn from data acquired through experience. The probabilistic framework, which describes how to represent and
manipulate uncertainty about models and predictions, has a central role in scientific data analysis, machine learning,
robotics, cognitive science and artificial intelligence. This Review provides an introduction to this framework, and discusses some of the state-of-the-art advances in the field, namely, probabilistic programming, Bayesian optimization,
data compression and automatic model discovery.
he key idea behind the probabilistic framework to machine learning is that learning can be thought of as inferring plausible models
to explain observed data. A machine can use such models to make
predictions about future data, and take decisions that are rational given
these predictions. Uncertainty plays a fundamental part in all of this.
Observed data can be consistent with many models, and therefore which
model is appropriate, given the data, is uncertain. Similarly, predictions
about future data and the future consequences of actions are uncertain.
Probability theory provides a framework for modelling uncertainty.
This Review starts with an introduction to the probabilistic approach
to machine learning and Bayesian inference, and then discusses some of
the state-of-the-art advances in the field. Many aspects of learning and
intelligence crucially depend on the careful probabilistic representation of
uncertainty. Probabilistic approaches have only recently become a mainstream approach to artificial intelligence1, robotics2 and machine learning3,4. Even now, there is controversy in these fields about how important
it is to fully represent uncertainty. For example, advances using deep neural
networks to solve challenging pattern-recognition problems such as speech
recognition5, image classification6,7, and prediction of words in text8, do not
overtly represent the uncertainty in the structure or parameters of those
neural networks. However, my focus will not be on these types of patternrecognition problems, characterized by the availability of large amounts
of data, but on problems for which uncertainty is really a key ingredient,
for example where a decision may depend on the amount of uncertainty.
I highlight five areas of current research at the frontier of probabilistic
machine learning, emphasizing areas that are of broad relevance to scientists across many fields: probabilistic programming, which is a general
framework for expressing probabilistic models as computer programs
and which could have a major impact on scientific modelling; Bayesian optimization, which is an approach to globally optimizing unknown
functions; probabilistic data compression; automating the discovery of
plausible and interpretable models from data; and hierarchical modelling
for learning many related models, for example for personalized medicine
or recommendation. Although considerable challenges remain, the coming decade promises substantial advances in artificial intelligence and
machine learning based on the probabilistic framework.
At the most basic level, machine learning seeks to develop methods for
computers to improve their performance at certain tasks on the basis of
Department of Engineering, University of Cambridge, Trumpington Street, Cambridge CB2 1PZ, UK.
4 5 2 | NAT U R E | VO L 5 2 1 | 2 8 M AY 2 0 1 5
REVIEW INSIGHT
often uncertainty about even the general structure of the model: is linear
regression or a neural network appropriate, if the latter, how many layers
should it have, and so on.
The probabilistic approach to modelling uses probability theory to
express all forms of uncertainty9. Probability theory is the mathematical
language for representing and manipulating uncertainty10, in much the
same way as calculus is the language for representing and manipulating
rates of change. Fortunately, the probabilistic approach to modelling is
conceptually very simple: probability distributions are used to represent
all the uncertain unobserved quantities in a model (including structural,
parametric and noise-related) and how they relate to the data. Then the
basic rules of probability theory are used to infer the unobserved quantities given the observed data. Learning from data occurs through the
transformation of the prior probability distributions (defined before
observing the data), into posterior distributions (after observing data).
The application of probability theory to learning from data is called
Bayesian learning (Box1).
Apart from its conceptual simplicity, there are several appealing properties of the probabilistic framework for machine intelligence. Simple
probability distributions over single or a few variables can be composed to form the building blocks of larger, more complex models. The
dominant paradigm in machine learning over the past two decades for
representing such compositional probabilistic models has been graphical models11, with variants including directed graphs (also known as
Bayesian networks and belief networks), undirected graphs (also known
as Markov networks and random fields), and mixed graphs with both
directed and undirected edges (Fig. 1). As discussed later, probabilistic
programming offers an elegant way of generalizing graphical models,
allowing a much richer representation of models. The compositionality
of probabilistic models means that the behaviour of these building blocks
in the context of the larger model is often much easier to understand
than, say, what will happen if one couples a non-linear dynamical system
(for example, a recurrent neural network) to another. In particular, for
a well-defined probabilistic model, it is always possible to generate data
from the model; such imaginary data provide a window into the mind
of the probabilistic model, helping us to understand both the initial prior
assumptions and what the model has learned at any later stage.
Probabilistic modelling also has some conceptual advantages over
alternatives because it is a normative theory for learning in artificially
intelligent systems. How should an artificially intelligent system represent
and update its beliefs about the world in light of data? The Cox axioms
define some desiderata for representing beliefs; a consequence of these
axioms is that degrees of belief , ranging from impossible to absolutely
certain, must follow all the rules of probability theory10,12,13. This justifies
the use of subjective Bayesian probabilistic representations in artificial
intelligence. An argument for Bayesian representations in artificial intelligence that is motivated by decision theory is given by the Dutch book
theorem. The argument rests on the idea that the strength of beliefs of an
agent can be assessed by asking the agent whether it would be willing to
accept bets at various odds (ratios of payoffs). The Dutch book theorem
states that unless an artificial intelligence systems (or humans, for that
matter) degrees of beliefs are consistent with the rules of probability it
will be willing to accept bets that are guaranteed to lose money14. Because
of the force of these and many other arguments on the importance of a
principled handling of uncertainty for intelligence, Bayesian probabilistic
modelling has emerged not only as the theoretical foundation for rationality in artificial intelligence systems, but also as a model for normative
behaviour in humans and animals1518 (but see refs 19, 20 for a discussion),
and much research is devoted to understanding how neural circuitry may
be implementing Bayesian inference21,22.
Although conceptually simple, a fully probabilistic approach to
machine learning poses a number of computational and modelling challenges. Computationally, the main challenge is that learning involves marginalizing (summing out) all the variables in the model except for the
variables of interest (Box1). Such high-dimensional sums and integrals
are generally computationally hard, in the sense that for many models
BOX 1
2 8 M AY 2 0 1 5 | VO L 5 2 1 | NAT U R E | 4 5 3
INSIGHT REVIEW
P(gen pred = T) = 104
Genetic
predisposition
Rare
disease
Genetic
markers
Symptom
One of the lessons of modern machine learning is that the best predictive performance is often obtained from highly flexible learning systems,
especially when learning from large data sets. Flexible models can make
better predictions because to a greater extent they allow data to speak
for themselves. (But note that all predictions involve assumptions and
therefore the data are never exclusively speaking for themselves.) There
are essentially two ways of achieving flexibility. The model could have
a large number of parameters compared with the data set (for example, the neural network recently used to achieve translations of English
and French sentences that approached the accuracy of state-of-the-art
methods is a probabilistic model with 384 million parameters29). Alternatively, the model can be defined using non-parametric components.
The best way to understand non-parametric models is through comparison to parametric ones. In a parametric model, there are a fixed,
finite number of parameters, and no matter how much training data
are observed, all the data can do is set these finitely many parameters
that control future predictions. By contrast, non-parametric approaches
have predictions that grow in complexity with the amount of training
data, either by considering a nested sequence of parametric models with
increasing numbers of parameters or by starting out with a model with
infinitely many parameters. For example, in a classification problem,
whereas a linear (parametric) classifier will always make predictions
using a linear boundary between classes, a non-parametric classifier can
learn a non-linear boundary whose shape becomes more complex with
more data. Many non-parametric models can be derived starting from
a parametric model and considering what happens as the model grows
to the limit of infinitely many parameters30. Clearly, fitting a model with
infinitely many parameters to finite training data would result in overfitting, in the sense that the models predictions might reflect quirks
of the training data rather than regularities that can be generalized to
test data. Fortunately, Bayesian approaches are not prone to this kind
of overfitting since they average over, rather than fit, the parameters
(Box1). Moreover, for many applications we have such huge data sets
that the main concern is underfitting from the choice of an overly simplistic parametric model, rather than overfitting.
A full discussion of Bayesian non-parametrics is outside the scope of
this Review (see refs 9, 31, 32 for this), but it is worth mentioning a few
of the key models. Gaussian processes are a very flexible non-parametric model for unknown functions, and are widely used for regression,
classification, and many other applications that require inference on
functions33. Consider learning a function that relates the dose of some
chemical to the response of an organism to that chemical. Instead of
modelling this relationship with, say, a linear parametric function, a
Gaussian process could be used to learn directly a non-parametric distribution of non-linear functions consistent with the data. A notable
example of a recent application of Gaussian processes is GaussianFace, a
state-of-the-art approach to face recognition that outperforms humans
and deep-learning methods34. Dirichlet processes are a non-parametric model with a long history in statistics35 and are used for density
estimation, clustering, time-series analysis and modelling the topics
of documents36. To illustrate Dirichlet processes, consider an application to modelling friendships in a social network, where each person
can belong to one of many communities. A Dirichlet process makes it
possible to have a model whereby the number of inferred communities
(that is, clusters) grows with the number of people37. Dirichlet processes
have also been used for clustering gene-expression patterns38,39. The
Indian buffet process (IBP)40 is a non-parametric model that can be
used for latent feature modelling, learning overlapping clusters, sparse
matrix factorization, or to non-parametrically learn the structure of
a deep network41. Elaborating the social network modelling example,
an IBP-based model allows each person to belong to some subset of a
large number of potential communities (for example, as defined by different families, workplaces, schools, hobbies, and so on) rather than a
single community, and the probability of friendship between two people
depends on the number of overlapping communities they have42. In this
case, the latent features of each person correspond to the communities,
which are not assumed to be observed directly. The IBP can be thought
of as a way of endowing Bayesian non-parametric models with distributed representations, as popularized in the neural network literature43.
An interesting link between Bayesian non-parametrics and neural networks is that, under fairly general conditions, a neural network with
4 5 4 | NAT U R E | VO L 5 2 1 | 2 8 M AY 2 0 1 5
REVIEW INSIGHT
initial
trans
states[1]
states[2]
states[3]
data[1]
data[2]
data[3]
statesmean
Probabilistic programming
The basic idea in probabilistic programming is to use computer programs to represent probabilistic models (http://probabilistic-programming.org)4547. One way to do this is for the computer program to define
a generator for data from the probabilistic model, that is, a simulator
(Fig.2). This simulator makes calls to a random number generator in
such a way that repeated runs from the simulator would sample different
possible data sets from the model. This simulation framework is more
general than the graphical model framework described previously since
computer programs can allow constructs such as recursion (functions
calling themselves) and control flow statements (for example, if statements that result in multiple paths a program can follow), which are
difficult or impossible to represent in a finite graph. In fact, for many
of the recent probabilistic programming languages that are based on
extending Turing-complete languages (a class that includes almost all
commonly used languages), it is possible to represent any computable
probability distribution as a probabilistic program48.
The full potential of probabilistic programming comes from automating the process of inferring unobserved variables in the model conditioned on the observed data (Box 1). Conceptually, conditioning needs
to compute input states of the program that generate data matching the
observed data. Whereas normally we think of programs running from
inputs to outputs, conditioning involves solving the inverse problem of
inferring the inputs (in particular the random number calls) that match
a certain program output. Such conditioning is performed by a universal inference engine, usually implemented by Monte Carlo sampling
over possible executions of the simulator program that are consistent
with the observed data. The fact that defining such universal inference
algorithms for computer programs is even possible is somewhat surprising, but it is related to the generality of certain key ideas from sampling
such as rejection sampling, sequential Monte Carlo methods25 and
approximate Bayesian computation49.
As an example, imagine you write a probabilistic program that simulates a gene regulatory model that relates unmeasured transcription
2 8 M AY 2 0 1 5 | VO L 5 2 1 | NAT U R E | 4 5 5
INSIGHT REVIEW
t=4
Posterior
New
observation
Next
point
Acquisition function
Acquisition function
Posterior
t=3
computed (green shaded area, bottom panels), which represents the gain
from evaluating the unknown function f at different x values; note that the
acquisition function is high where the posterior over f has both high mean
and large uncertainty. Different acquisition functions can be used such as
expected improvement or information gain. The peak of the acquisition
function (red line) is the best next point to evaluate, and is therefore chosen
for evaluation (red dot, new observation). The left and right panels show
an example of what could happen after three and four function evaluations,
respectively.
Bayesian optimization
Data compression
4 5 6 | NAT U R E | VO L 5 2 1 | 2 8 M AY 2 0 1 5
REVIEW INSIGHT
probabilistic machine learning, including special compression
methods for non-sequence data such as images, graphs and other
structured objects.
This component is approximately periodic with a period of 10.7 years. Across periods the shape
of this function varies smoothly with a typical length scale of 34.4 years. The shape of this
function within each period is very smooth and resembles a sinusoid. This component applies
until 1643 and from 1716 onwards. This component explains 72.4% of the residual variance;
this increases the total variance explained from 73.0% to 92.5%. The addition of this component
reduces the cross validated MAE by 16.67% from 0.18 to 0.15.
200
150
100
50
0
Posterior of component 4
1955
1965
1975
1985
0.6
0.4
0.2
0
0.2
0.4
0.6
0.8
1995
40
30
20
10
136.2
1361.5
136.1
1360.5
136.0
Figure 8: Pointwise posterior of component 4 (left) and the posterior of the cumulative sum of
components with data (right)
0
10
20
400
30
300
1960
1970
1980
1990
2000
2010
200
100
1362
0
100
200
1361.5
300
50
1361
1360.5
#
1360
1650
1700
1750
1800
1850
1900
1950
2000
2050
1
2
3
4
ACF
min min loc
0.613
0.265
0.670
0.520
0.946
0.654
0.372
0.519
Periodogram
max
max loc
0.684
0.967
0.523
0.679
0.175
0.155
0.455
0.678
QQ
max
min
0.071
0.070
0.019
0.368
0.886
0.896
0.474
0.661
No structure
LIN
SE
PER
LIN
LIN SE
LIN + PER
LIN SE + SE
...
...
LIN PER
...
SE
approximately
...
...
PER
periodic function
...
discovered are translated into English phrases (bottom right). The end
result is a report with text, figures and tables, describing in detail what has
been inferred about the data, including a section on model checking and
criticism (top right)99,100. Data and the report are for illustrative purposes
only.
2 8 M AY 2 0 1 5 | VO L 5 2 1 | NAT U R E | 4 5 7
INSIGHT REVIEW
an open-ended language of models82. In contrast to work on equation
learning (see for example ref. 83), the models attempt to capture general
properties of functions (for example, smoothness, periodicity or trends)
rather than a precise equation. Handling uncertainty is at the core of the
Automatic Statistician; it makes use of Bayesian non-parametrics to give
it the flexibility to obtain state-of-the-art predictive performance, and
uses the metric marginal likelihood (Box1) to search the space of models.
Important earlier work includes statistical expert systems84,85 and the
Robot Scientist, which integrates machine learning and scientific discovery in a closed loop with an experimental platform in microbiology to
automate the design and execution of new experiments86. Auto-WEKA
is a recent project that automates learning classifiers, making heavy use of
the Bayesian optimization techniques already described71. Efforts to automate the application of machine-learning methods to data have recently
gained momentum, and may ultimately result in artificial intelligence
systems for data science.
Perspective
4 5 8 | NAT U R E | VO L 5 2 1 | 2 8 M AY 2 0 1 5
REVIEW INSIGHT
40. Griffiths, T. L. & Ghahramani, Z. The Indian buffet process: an introduction and
review. J. Mach. Learn. Res. 12, 11851224 (2011).
This article introduced a new class of Bayesian non-parametric models for
latent feature modelling.
41. Adams, R. P., Wallach, H. & Ghahramani, Z. Learning the structure of deep
sparse graphical models. In Proc. 13th International Conference on Artificial
Intelligence and Statistics (eds Teh, Y. W. & Titterington, M.) 18 (2010).
42. Miller, K., Jordan, M. I. & Griffiths, T. L. Nonparametric latent feature models
for link prediction. In Proc. Advances in Neural Information Processing Systems
12761284 (2009).
43. Hinton, G. E., McClelland, J. L. & Rumelhart, D. E. in Parallel Distributed
Processing: Explorations in the Microstructure of Cognition: Foundations 77109
(MIT Press, 1986).
44. Neal, R. M. Bayesian Learning for Neural Networks (Springer, 1996).
This text derived MCMC-based Bayesian inference in neural networks and
drew important links to Gaussian processes.
45. Koller, D., McAllester, D. & Pfeffer, A. Effective Bayesian inference for stochastic
programs. In Proc. 14th National Conference on Artificial Intelligence 740747
(1997).
46. Goodman, N. D. & Stuhlmller, A. The Design and Implementation of Probabilistic
Programming Languages. Available at http://dippl.org (2015).
47. Pfeffer, A. Practical Probabilistic Programming (Manning, 2015).
48. Freer, C., Roy, D. & Tenenbaum, J. B. in Turings Legacy (ed. Downey, R.)
195252 (2014).
49. Marjoram, P., Molitor, J., Plagnol, V. & Tavar, S. Markov chain Monte Carlo
without likelihoods. Proc. Natl Acad. Sci. USA 100, 1532415328 (2003).
50. Mansinghka, V., Kulkarni, T. D., Perov, Y. N. & Tenenbaum, J. Approximate
Bayesian image interpretation using generative probabilistic graphics
programs. In Proc. Advances in Neural Information Processing Systems 26
15201528 (2013).
51. Bishop, C. M. Model-based machine learning. Phil. Trans. R. Soc. A 371,
20120222 (2013).
This article is a very clear tutorial exposition of probabilistic modelling.
52. Lunn, D. J., Thomas, A., Best, N. & Spiegelhalter, D. WinBUGS a Bayesian
modelling framework: concepts, structure, and extensibility. Stat. Comput. 10,
325337 (2000).
This reports an early probabilistic programming framework widely used in
statistics.
53. Stan Development Team. Stan Modeling Language Users Guide and Reference
Manual, Version 2.5.0. http://mc-stan.org/ (2014).
54. Fischer, B. & Schumann, J. AutoBayes: a system for generating data analysis
programs from statistical models. J. Funct. Program. 13, 483508 (2003).
55. Minka, T. P., Winn, J. M., Guiver, J. P. & Knowles, D. A. Infer.NET 2.4. http://
research.microsoft.com/infernet (Microsoft Research, 2010).
56. Wingate, D., Stuhlmller, A. & Goodman, N. D. Lightweight implementations of
probabilistic programming languages via transformational compilation. In Proc.
International Conference on Artificial Intelligence and Statistics 770778 (2011).
57. Pfeffer, A. IBAL: a probabilistic rational programming language. In Proc.
International Joint Conference on Artificial Intelligence 733740 (2001).
58. Milch, B. et al. BLOG: probabilistic models with unknown objects. In Proc. 19th
International Joint Conference on Artificial Intelligence 13521359 (2005).
59. Goodman, N., Mansinghka, V., Roy, D., Bonawitz, K. & Tenenbaum, J. Church: a
language for generative models. In Proc. Uncertainty in Artificial Intelligence 22
23 (2008).
This is an influential paper introducing the Turing-complete probabilistic
programming language Church.
60. Pfeffer, A. Figaro: An Object-Oriented Probabilistic Programming Language. Tech.
Rep. (Charles River Analytics, 2009).
61. Mansinghka, V., Selsam, D. & Perov, Y. Venture: a higher-order probabilistic
programming platform with programmable inference. Preprint at http://arxiv.
org/abs/1404.0099 (2014).
62. Wood, F., van de Meent, J. W. & Mansinghka, V. A new approach to probabilistic
programming inference. In Proc. 17th International Conference on Artificial
Intelligence and Statistics 10241032 (2014).
63. Li, L., Wu, Y. & Russell, S. J. SWIFT: Compiled Inference for Probabilistic Programs.
Report No. UCB/EECS-201512 (Univ. California, Berkeley, 2015).
64. Bergstra, J. et al. Theano: a CPU and GPU math expression compiler. In Proc.
9th Python in Science Conference http://conference.scipy.org/proceedings/
scipy2010/ (2010).
65. Kushner, H. A new method of locating the maximum point of an arbitrary
multipeak curve in the presence of noise. J. Basic Eng. 86, 97106 (1964).
66. Jones, D. R., Schonlau, M. & Welch, W. J. Efficient global optimization of
expensive black-box functions. J. Glob. Optim. 13, 455492 (1998).
67. Brochu, E., Cora, V. M. & de Freitas, N. A tutorial on Bayesian optimization
of expensive cost functions, with application to active user modeling
and hierarchical reinforcement learning. Preprint at http://arXiv.org/
abs/1012.2599 (2010).
68. Hennig, P. & Schuler, C. J. Entropy search for information-efficient global
optimization. J. Mach. Learn. Res. 13, 18091837 (2012).
69. Hernndez-Lobato, J. M., Hoffman, M. W. & Ghahramani, Z. Predictive entropy
search for efficient global optimization of black-box functions. In Proc. Advances
in Neural Information Processing Systems 918926 (2014).
70. Snoek, J., Larochelle, H. & Adams, R. P. Practical Bayesian optimization of
machine learning algorithms. In Proc. Advances in Neural Information Processing
Systems 29602968 (2012).
71. Thornton, C., Hutter, F., Hoos, H. H. & Leyton-Brown, K. Auto-WEKA: combined
REVIEW
doi:10.1038/nature14542
We are witnessing the advent of a new era of robots drones that can autonomously fly in natural and man-made
environments. These robots, often associated with defence applications, could have a major impact on civilian tasks,
including transportation, communication, agriculture, disaster mitigation and environment preservation. Autonomous
flight in confined spaces presents great scientific and technical challenges owing to the energetic cost of staying airborne
and to the perceptual intelligence required to negotiate complex environments. We identify scientific and technological
advances that are expected to translate, within appropriate regulatory frameworks, into pervasive use of autonomous
drones for civilian applications.
buildings where ground clutter is an obstacle for terrestrial robots. Coordinated teams of autonomous drones will enable missions that last longer
than the flight time of a single drone by allowing some drones to temporarily leave the team for battery replacement. Drone teams will permit rescue organizations to quickly deploy dedicated communication networks
for ground operators. Telecommunication companies can also leverage
drone networks to temporarily supplement or replace points of service.
Realizing this vision will require new mechatronic solutions and new
levels of control autonomy (Box1) to safely complete missions in confined spaces and near the ground, where absolute positioning signals and
remote control are not available or sufficiently precise. None of todays
commercial drones has sufficient control autonomy to complete any of
the missions described without skilled human supervision, which makes
those operations slow, dangerous and not scalable. We identify the most
promising scientific and technological advances that could lead to a new
generation of small autonomous drones and offer a tentative road map of
capability deployment within suitable regulatory frameworks.
Laboratory of Intelligent Systems, Ecole Polytechnique Fdrale de Lausanne, Station 11, Lausanne, CH 1015, Switzerland. 2Microrobotics Lab, Harvard University, 149 Maxwell Dworkin Building,
Cambridge, Massachusetts 02138, USA.
4 6 0 | NAT U R E | VO L 5 2 1 | 2 8 M AY 2 0 1 5
REVIEW INSIGHT
a
Figure 1 | Autonomous drones in rescue situations. a, Fixed-wing drones with a long flight time could provide birds-eye-view images and a communication
network for rescuers on the ground. b, Rotorcrafts with hovering capabilities could inspect structures for cracks and leaks; and c, transport medical supplies from
nearby hospitals. d, Swarms of dispensable drones with flapping wings could enter buildings to search for chemical hazards. e, Multi-modal caged robots could fly
and roll into complex structures to safely search for signs of life.
craft nor rotorcraft scale down well both in terms of the aerodynamics
that govern flight and in the performance of the components that are necessary to generate propulsion. Bio-inspired flapping-wing flight (Fig.2c)
offers an alternative paradigm that can be scaled down in size, but brings
fluid-mechanics-modelling and control challenges.
Propulsion and manoeuvrability considerations
Small-scale aerial vehicles are predominantly fixed-wing propeller-driven
aircraft or rotorcraft. The latter can be further broken down into conventional primary and tail rotor configurations, coaxial dual rotors1, or
quadcopter designs, which have recently become popular. For each of
these configurations, and for propeller-driven aircraft at larger scales,
there have been only small incremental improvements to propeller design
over the past several decades, since the advent of standards and tools to
generate airfoil shapes by the National Advisory Committee for Aeronautics2. Propulsive efficiencies for rotorcraft degrade as the vehicle size is
reduced; an indicator of the energetic challenges for flight at small scales.
Smaller size typically implies lower Reynolds numbers, which in turn
suggests an increased dominance of viscous forces, causing greater drag
coefficients and reduced lift coefficients compared with larger aircraft.
To put this into perspective, this means that a scaled-down fixed-wing
aircraft would be subject to a lower lift-to-drag ratio and thereby require
greater relative forward velocity to maintain flight, with the associated
drag and power penalty reducing the overall energetic efficiency. The
impacts of scaling challenges (Fig.3) are that smaller drones have less
endurance, and that the overall flight times range from tens of seconds to
tens of minutes unfavourable compared with human-scale vehicles.
There are, however, manoeuvrability benefits that arise from decreased
vehicle size. For example, the moment of inertia is a strong function of
the vehicles characteristic dimension a measure of a critical length
of the vehicle, such as the chord length of a wing or length of a propeller
in a similar manner as used in Reynolds number scaling. Because the
moment of inertia of the vehicle scales with the characteristic dimension,
L, raised to the fifth power, a decrease in size from a 11m wingspan, fourseat aircraft such as the Cessna 172 to a 0.05m rotor-to-rotor separation
Blade Pico QX quadcopter implies that the Cessna has about 51011 the
inertia of the quadcopter (with respect to roll). Depending on the scaling
law used for torque generated by the quadcopter, this leads to angular
acceleration that scales either by L1 or L2, both cases indicating that
the quadcopter will be much more manoeuvrable3. This has resulted in
remarkable demonstrations of aerobatic manoeuvres in rooms equipped
with real-time tracking and control by external computers, including flying through windows at high speeds4 and manipulating objects in flight5.
This enhanced agility, often achieved at the expense of open-loop stability,
requires increased emphasis on control a challenge also exacerbated
by the size, weight and power constraints of these small vehicles. Implementing these behaviours in autonomous drones (with on-board sensing
and computation) is a significant challenge, but one that is beginning to
have promising results6,7 even for aggressive flight manoeuvres. The
challenges for further scaling down these sensing and control systems are
discussed in the final section of this Review.
Actuation, power and manufacturing
Beyond autonomy considerations, the size and type of the flying robot
also engender associated challenges with actuation and manufacturing.
For actuation, rotorcraft and propeller-driven fixed-wing drones generally use electromagnetic motors. Some flapping-wing aircraft8,9 also use
electromagnetic motors, but require a linkage mechanism to convert the
rotary motion of the motor to the flapping motion of the wings. However,
as the scale is reduced, conventional motors become inefficient, require
substantial gearing to achieve a desired wing or propeller velocity, and are
extremely difficult to manufacture. Therefore, for vehicles below a few
grams, alternative methods of actuation are required. Similarly, macro
size flying robots (tens to hundreds of grams and larger) may be constructed using conventional methods such as additive and subtractive
machining and nuts-and-bolts assembly. However, for micro size flying robots (less than a few grams), manufacturing needs to be rethought
and novel methods applied. To give a perspective on the challenges for
actuation, we need to consider how the physics of scaling10 affects the
performance of electromagnetic motors. At macroscopic scales, motors
used in electric cars can achieve an efficiency of nearly 90% with a power
density of several kilowatts perkilogram whereas at millimetre scales,
existing motors produce a few tens of watts per kilogram of power at less
than 50% efficiency11. This reduction in performance and the energetic
expense of flight at small scales means that both power source and power
distribution are important considerations12. For vehicles for which the primary propulsion system is also responsible for generating control torques,
this does not apply (for example, quadcopters). However, for fixed-wing
aircraft and some flapping-wing vehicles, it is essential that most of the
power budget must be allocated to staying aloft with a proportionately
smaller power for actuating control surfaces. For example, nearly 90% of
the power budget of the RoboBee (Fig.2c) is dedicated to the primary
power actuators that generate lift to stay in the air13. As already discussed,
for power actuators, the typical choice is electromagnetic motors. There
are alternatives, including electroactive materials such as piezoelectric
actuators14 or ultrasonic motors15, however, these are most appropriate
for the smaller end of the size spectrum. Control actuators have more
options at most scales, including voice coils, shape memory alloys16 and
electroactive polymers17.
Given the challenges for small-scale propulsion, an unsteady
approach to force production is potentially a viable option for small
vehicles. Unlike fixed-wing aircraft and rotorcraft that produce lift and
thrust in a quasi-steady way (either directly in the case of rotorcraft
or indirectly by the movement of air over the wings of a fixed-wing
drone), the aerodynamics of flapping-wing drones is complex and
involves the generation and manipulation of vortical structures in the
air18. As is obvious from its existence in nature, flapping-wing propulsion is an effective form of locomotion, but translating this into
2 8 M AY 2 0 1 5 | VO L 5 2 1 | NAT U R E | 4 6 1
INSIGHT REVIEW
functional designs is not a trivial exercise given the morphological
diversity of the flight apparatus of insects, bats and birds. Nonetheless,
drone researchers have effectively reproduced natural flight in many
different ways. Examples include robotic insects such as RoboBee19
and the four-winged DelFly8, and bird-sized drones such as the Nano
Hummingbird9 (Fig. 2c).
As size is reduced, in addition to the challenges for actuator selection
and manufacturing, there are also considerations of how to manufacture
the entire vehicle. At scales for which electromagnetic actuation and
bearing-based rotary joints are feasible, more conventional manufacturing methods are used, such as subtractive machining, additive printing
and moulding of composite materials. At small scales for example, for
aircraft of comparable size to insects and small birds some of these
techniques fail, typically owing to limited resolution. Alternative methods have been developed; for example, those based on folding (an inherently scalable technique) have been used to create insect-sized robots,
avoiding the challenges that are inherent to macro-scale nuts-and-bolts
approaches20.
BOX 1
Control autonomy
According to the definition by the International Organization
for Standardization, robot autonomy is the ability to perform
intended tasks based on current state and sensing, without
human intervention98. This definition encompasses a wide
range of situations, which demand different levels of autonomy
depending on the type of robot and the intended use. For example,
although autonomy in tethered robots does not concern energy
management, mobile robots with long-range travel may require the
capability to decide when to abort the current mission and locate a
recharging station. In the case of the small drones discussed here,
we can identify three levels of increasing autonomy (Table 1).
Sensory-motor autonomy: translate high-level human commands
(such as to reach a given altitude, perform circular trajectory, move
to global positioning system (GPS) coordinates or maintain position)
into combinations of platform-dependent control signals (such as
pitch, roll, yaw angles or speed); follow pre-programmed trajectory
using GPS waypoints.
Reactive autonomy (requires sensory-motor autonomy):
maintain current position or trajectory in the presence of external
perturbations, such as wind or electro-mechanical failure; avoid
obstacles; maintain a safe or predefined distance from ground;
coordinate with moving objects, including other drones; take off
and land.
Cognitive autonomy (requires reactive autonomy): perform
simultaneous localization and mapping; resolve conflicting
information; plan (for battery recharge for example); recognize
objects or persons; learn.
Validated on
drone type
Little
Yes
Deployed
All types
Reactive
Few and
autonomy sparse
Medium
Little
Partly
deployed
Fixed
wing,
rotorcraft
and
flapping
wing
High
None
Not yet
deployed
Mostly
rotorcraft
Multi-modal drones
In many situations, such as search and rescue, parcel delivery in confined spaces and environmental monitoring, it may be advantageous to
combine aerial and terrestrial capabilities. Perching mechanisms could
allow drones to land on walls21 and power lines22 in order to monitor the
environment from a high vantage point while saving energy. Agile drones
could move on the ground by using legs in conjunction with retractable23
or flapping wings24. In an effort to minimize the total cost of transport25,
which will be increased by the additional locomotion mode, these future
drones may benefit from using the same actuation system for flight control and ground locomotion. The different speed and torque requirements
of these two locomotion modes could be reconciled by adapting the wing
morphology to the specific situation26, similar to the way vampire bats
(Desmodus rotundus) use their powerful front limbs when flying or walking27. Alternatively, multi-modal locomotion could be obtained by adding
large wheels to the sides of hovering drones28, by embedding the propulsion system in a rolling cage29, or by completely decoupling the spherical
cage from the inner rotors by means of a gimbal system30 (Fig.2b); the latter design allows the drone not only to roll on the ground in any direction,
but, because the rotors are protected by a cage, also to safely collide with
obstacles or humans without attitude perturbations. Conversely, wings
could be added to ground-based robots travelling on rough terrain to
extend their jumping distances, stabilize the landing phase and reduce the
impact with the ground31. In this case as well, the total cost of transport
could be reduced by sharing the same actuation system between jumping
and wing-deployment mechanisms32. Alternatively, one could use pivoting wings, which minimize drag at take-off, improve the transition from
ballistic jump to gliding and maximize the gliding ratio33. In the future,
we may also deploy drones in semi-aquatic environments by using design
principles inspired by aquatic birds, such as folding wings for plunge diving or hydrophobic surfaces for dry flight, or by flying squid, which use
water-jet propulsion for take off34.
4 6 2 | NAT U R E | VO L 5 2 1 | 2 8 M AY 2 0 1 5
REVIEW INSIGHT
a
Roll
y
Pitch
1 cm
z
Yaw
Compound eyes have fixed focus, lower resolution and smaller binocular
overlap than human eyes and therefore do not use stereo vision for distance estimation, except for very short ranges37. To safely fly in cluttered
environments, insects instead rely on image motion, also known as optic
flow38,39, generated by their own displacement relative to the surroundings40. It has been experimentally shown that their neural system reacts to
optic flow patterns41,42 to produce a large variety of flight capabilities, such
as obstacle avoidance40,43, speed maintenance44, odometry estimation45,
wall following and corridor centring46, altitude regulation47,48, orientation
control49 and landing50,51. Optic flow intensity is proportional to the distance from objects only during translational movements, but not during
rotational movements when it is proportional to the rotational velocity of the agent. Furthermore, optic flow intensities also depend on the
speed-to-distance ratio, which raises the issue of how insects can estimate
ground speed and distance from obstacles at the same time52.
Many vision-based insect capabilities have been replicated with small
drones. For example, it has been shown that small fixed-wing drones53
and helicopters54 can regulate their distance from the ground using ventral
optic flow while a GPS was used to maintain constant speed and an IMU
was used to regulate roll angle. The addition of lateral optic flow sensors
also allowed a fixed-wing drone to detect near-ground obstacles55. Optic
flow has also been used to perform both collision-free navigation and
altitude control of indoor56 and outdoor57 fixed-wing drones without a
GPS. In these drones, the roll angle was regulated by optic flow in the
horizontal direction and the pitch angle was regulated by optic flow in
the vertical direction, while the ground speed was measured and maintained by wind-speed sensors. In this case, the rotational optic flow was
minimized by flying along straight lines interrupted by short turns or
was estimated with on-board gyroscopes and subtracted from the total
optic flow, as suggested by biological models58,59. Other authors have even
proposed bio-inspired control methods that do not require absolute speed
measurement with dedicated sensors, thus eliminating a potential source
of error. These methods consist of continuously adjusting flight speed and
altitude to maintain constant optic flow signals, which has also been suggested by biological models60. For example, altitude control and landing
was achieved by adding negative feedback from ventral optic flow, either
to the control surfaces that regulate pitch angle53 or to those that regulate
thrust61. The latter method has also been shown to be effective for altitude
control and landing on mobile platforms62 and, when used in conjunction
with lateral optic flow and lateral thrust, also for replicating flight trajectories of honeybees during indoor flight62. However, so far this method
has been validated only on tethered drones. An error correction method
has also been demonstrated on a quadcopter equipped with omnidirectional vision to fly in corridors by continuously making corrections aimed
at reducing the difference between measured optic flow and optic flow
expected from flight at the desired altitude, attitude, speed, heading and
distance from walls63. Quadcopter and flapping-wing drones are intrinsically unstable and must compensate for positional drift generated by noise
in gyroscope and accelerometer signals to hover in place and maintain
attitude. It has been shown that the direction of optic flow can be used to
reduce uncertainty in inertial sensor signals64. Wide-field optical sensors
inspired by the simple eyes of insects, called ocelli65, have been used for
attitude stabilization of flapping-wing drones66 and quadcopters67.
The need to detect optic flow over wide fields of view at high temporal
frequency in a small package has also driven the development of insectinspired vision sensors that are smaller, lighter and faster than single-lens
conventional cameras. For example, some authors5557 have resorted to
using the multiple optic flow sensors found in computer optical mice.
Whereas others68,69 have developed neuromorphic chips that not only
extract optic flow, but also adapt to the large variety of light intensities that
can be experienced when flying in confined spaces. Similarly, specialized
micro optic flow sensors have been used for altitude regulation in sub-1g
flying robots70. Miniature, curved, artificial compound eyes with a high
density of photoreceptors, wide fields of view and computational properties similar to insect eyes have been recently described71,72, but have not
yet been used in drones.
Coordinated flight of multiple drones is another instance of reactive
autonomy, but raises it additional challenges in sensing, communication
and control73. Although very little is known about the sensing and control that underlies coordinated flight in insects, it has been shown that
2 8 M AY 2 0 1 5 | VO L 5 2 1 | NAT U R E | 4 6 3
INSIGHT REVIEW
102
Flapping wing
Rotorcraft
Fixed wing
24
13
28
101
23
4
5
7
6
3
1
2
20
25 8
21
9
17
16
27
26
12
19
22
14
18
15
10
100
11
101
100
101
1. Nano Hummingbird
2. DelFly Explorer
3. DelFly Micro
4. H2Bird
5. MicroBat
6. Bionic Bird
7. Avitron V2.0
8. 36 cm Ornithopter
9. 28 cm Ornithopter
10. 15 cm Ornithopter
11. 10 cm Ornithopter
12. Parrot Bebop drone
13. PD-100 Black Hornet PRS
14. DJI Phantom 2
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
102
103
Seiko-Epson uFR-II
Ladybird V2
Mini X6
350.QX2
AR.Drone 2.0
QR Y100
QR W100S
eBee
Black Widow
Wasp III
Univ. Florida MAV
H301S
Diamond 600 EP
EPFL MC2
Figure 3 | Flight time against mass of small (less than 1kg) drones.
Examples include each of the drone types shown in Fig. 2 (fixed wing, rotary
and flapping wing). Regardless of the type, there is a clear trend in how flight
time scales with mass. Smaller drones have significantly reduced flight times
(tens of seconds compared with tens of minutes for larger drones). This is due
to actuation limitations and the physics of flight at small scales, as discussed
in this Review, and brings about challenges for all levels of autonomy
described in Box 1.
localization and mapping of quadcopters in indoor85 and outdoor environments86. However, these drones could not avoid obstacles that had not
been previously detected and mapped. Stereo vision can provide depth
information and has been used in a 4g flapping-wing vehicle to reactively avoid obstacles87 and in larger quadcopters to perform simultaneous
localization and mapping88. Furthermore, both simultaneous localization
and mapping and altitude control have been demonstrated in quadcopters with two sets of stereo cameras, one set pointing forwards and one
set pointing downwards89. Simultaneous localization and mapping algorithms have also been combined with optic flow methods to estimate
distances from the surrounding environment and stabilize the drone90.
The statistical approach to cognitive autonomy builds on high-resolution digital cameras that can provide dense clouds of data points and on
computationally expensive algorithms for reducing uncertainties. Consequently, cognitive autonomy requires heavier sensory payloads and more
powerful computational units, which may explain why, so far, progress has
been slower for small drones. Although various forms of reactive autonomy have been demonstrated in several types of drones with a wide range
of mass and flight endurance (Fig.3), simultaneous localization, mapping,
and path planning have so far been demonstrated mostly in hovering
platforms with a relatively large total mass and restricted flight endurance.
Regulatory issues
4 6 4 | NAT U R E | VO L 5 2 1 | 2 8 M AY 2 0 1 5
REVIEW INSIGHT
the European Commission explicitly aims to improve the public perception of drones and to promote widespread use of these technologies for
public and commercial use96.
Scaling up the use of small drones from a niche market to widespread use
in civilian applications depends on two related prerequisites: the capability to autonomously and safely manoeuvre in confined spaces and the
removal of the legal requirement of supervised operation within the line
of sight. We do not foresee major scientific or technological roadblocks
to achieving higher levels of autonomous control in research and commercial drones within the next fiveyears. However, the legal requirement
of a certified human operator within the line of sight of every single drone
is a roadblock that will almost certainly stay for the next five years in the
United States and Europe, and removing it will depend on the reliability
and safety of small drones.
Assuming that the legal roadblock will gradually be lifted, we expect
that reactive forms of control autonomy (Box1) will become widely
available within the next 510years for small commercial drones for
long-range operation. We also anticipate that bio-inspired approaches
will dominate because they require relatively simple computation and
sensors. For example, some commercial drones (such as the eBee and
the AR.Drone 2.0 in Fig.2) already use ventral optic flow for outdoor
landing and for indoor position stabilization. In this context, an interesting scientific challenge will be to understand how different navigation
capabilities can be integrated into a coherent control system akin to the
nervous system of a flying insect. In parallel, an important engineering
challenge will be to define test conditions and performance standards in
cooperation with governmental institutions and industrial associations
to assess the capabilities and reliability of drone technologies.
We also expect rapid progress in cognitive autonomy, which will continue to be driven by the development of artificial intelligence for smartphones capable of identifying human users, learning their behaviours
and creating representations of their environment (http://www.google.
com/atap). On the one hand, face recognition and gesture-based interaction without wearable devices will become widely available for hobby and
toy drones within the next five years; for example, by equipping small
drones with human-motion-sensing devices developed by the gaming
industry. On the other hand, mapping and path planning for autonomous flight in partially unknown and changing environments will represent a challenge for small drones for at least the next ten years and will
continue to be illegal in the United States and Europe until at least 2028.
Despite these legal roadblocks, we expect an increasing demand for
small drones in civilian use because of their intrinsic safety. Kinetic
energy one measure of the potential of a drone to cause physical
harm to a human97 is linearly proportional to the drones mass and
is quadratic in velocity. Thus smaller drones are likely to cause proportionally less harm as the size is reduced, but more subtly, small
drones typically operate at lower speeds, dramatically decreasing the
potential for harm. For example, a 500g drone flying at 5ms1 has
6.5J of kinetic energy, which is equivalent to the potential energy of a
large apple dropped from about 2m. As the mobile computing industry continues along the path of miniaturization, drone developers will
continue to reap the benefits in the form of smaller, lower power sensor
packages (for example, the IMUs used in cell phones and video game
controllers) and the slow, but steady increase in battery energy density.
Another hardware advance that will inevitably affect future drones is
the movement away from general-purpose computation in favour of
more specialized high-performance and low-power hardware accelerators tuned for the various functions needed by autonomous drones
(both reactive and cognitive autonomy). These accelerators are already
present in various mobile devices and are suggested to be a solution to
computation for the control of insect-scale drones19.
Received 17 November 2014; accepted 18 March 2015.
1. Bouabdallah, S. Design and Control of Quadrotors with Applications to
INSIGHT REVIEW
aerialaquatic robotic platforms. Bioinspir. Biomim. 9, 031001 (2014).
35. Floreano, D., Zufferey, J.-C., Srinivasan, M. V. & Ellington, C. Flying Insects and
Robots (Springer, 2009).
This book provides an introduction to insect-inspired drones for biologists
and engineers.
36. Land, M. F. & Nilsson, D.-E. Animal Eyes (Oxford Univ. Press, 2002).
37. Srinivasan, M. V. How insects infer range from visual motion. Rev. Oculomot. Res.
5, 139156 (1993).
38. Gibson, J. J. The Perception of the Visual World (Houghton Mifflin, 1950).
39. Koenderink, J. J. Optic Flow. Vision Res. 26, 161179 (1986).
40. Lehrer, M., Srinivasan, M. V., Zhang, S. W. & Horridge, G. A. Motion cues provide
the bees visual world with a third dimension. Nature 332, 356357 (1988).
41. Franceschini, N., Riehle, A. & Le Nestour, A. in Facets of Vision (eds Stavenga, D.
G. & Hardie, R. C.) 360390 (Springer, 1989).
42. Krapp, H. G. & Hengstenberg, R. Estimation of self-motion by optic flow
processing in single visual interneurons. Nature 384, 463466 (1996).
43. Tammero, L. F. & Dickinson, M. H. The influence of visual landscape on the free
flight behavior of the fruit fly Drosophila melanogaster. J. Exp. Biol. 205, 327343
(2002).
44. Barron, A. & Srinivasan, M. V. Visual regulation of ground speed and headwind
compensation in freely flying honey bees (Apis mellifera L.). J. Exp. Biol. 209,
978984 (2006).
45. Srinivasan, M. V., Zhang, S. W., Altwein, M. & Tautz, J. Honeybee navigation:
nature and calibration of the odometer. Science 287, 851853 (2000).
46. Serres, J., Masson, G., Ruffier, F. & Franceschini, N. A bee in the corridor:
centering and wall-following. Naturwissenschaften 95, 11811187 (2008).
47. Portelli, G., Ruffier, F. & Franceschini, N. Honeybees change their height to
restore optic flow. J. Comp. Physiol. A Neuroethol. Sens. Neural Behav. Physiol.
196, 307313 (2010).
48. Straw, A. D., Serin, L. & Dickinson, M. H. Visual control of altitude in flying
Drosophila. Curr. Biol. 20, 15501556 (2010).
49. Egelhaaf, M. & Borst, A. A look into the cockpit of the fly: visual orientation,
algorithms, and identified neurons. J. Neurosci. 13, 45634574 (1993).
50. Wagner, H. Flow-field variables trigger landing in flies. Nature 297, 147148
(1982).
51. Baird, E., Boeddeker, N., Ibbotson, M. R. & Srinivasan, M. V. A universal strategy
for visually guided landing. Proc. Natl Acad. Sci. USA 110, 1868618691
(2013).
52. Taylor, G. K. & Krapp, H. G. Sensory systems and flight stability: what do insects
measure and why? Adv. Insect Physiol. 34, 231316 (2007).
53. Chahl, J. S., Srinivasan, M. V. & Zhang, S. W. Landing strategies in honeybees
and applications to uninhabited airborne vehicles. Int. J. Robot. Res. 23,
101110 (2004).
54. Garratt, M. A. & Chahl, J. S. Vision-based terrain following for an unmanned
rotorcraft. J. Field Robot. 25, 284301 (2008).
55. Griffiths, S. et al. Maximizing miniature aerial vehicles. IEEE Robot. Autom. Mag.
13, 3443 (2006).
56. Zufferey, J.-C., Klaptocz, A., Beyeler, A., Nicoud, J.-D. & Floreano, D. A 10-gram
vision-based flying robot. Adv. Robot. 21, 16711684 (2007).
57. Beyeler, A., Zufferey, J.-C. & Floreano, D. Vision-based control of near-obstacle
flight. Auton. Robots 27, 201219 (2009).
58. Chan, W. P., Prete, F. & Dickinson, M. H. Visual input to the efferent control
system of a flys gyroscope. Science 280, 289292 (1998).
59. Collett, T. S. Some operating rules for the optomotor system of a hoverfly during
voluntary flight. J. Comp. Physiol. A Neuroethol. Sens. Neural Behav. Physiol. 138,
271282 (1980).
60. Baird, E., Srinivasan, M. V., Zhang, S. & Cowling, A. Visual control of flight speed
in honeybees. J. Exp. Biol. 208, 38953905 (2005).
61. Ruffier, F. & Franceschini, N. Optic flow regulation: the key to aircraft automatic
guidance. Robot. Auton. Syst. 50, 177194 (2005).
62. Roubieu, F. L. et al. A biomimetic vision-based hovercraft accounts for bees
complex behaviour in various corridors. Bioinspir. Biomim. 9, 036003 (2014).
63. Conroy, J., Gremillion, G., Ranganathan, B. & Humbert, S. J. Implementation of
wide-field integration of optic flow for autonomous quadrotor navigation. Auton.
Robots 27, 189198 (2009).
64. Briod, A., Zufferey, J.-C. & Floreano, D. Optic-flow based control of a 46g
quadrotor. In Proc. Workshop on Vision-based Closed-Loop Control and Navigation
of Micro Helicopters in GPS-denied Environments http://rpg.ifi.uzh.ch/IROS13_
TOC.html (2013).
65. Schuppe, H. & Hengstenberg, R. Optical properties of the ocelli of Calliphora
erythrocephala and their role in the dorsal light response. J. Comp. Physiol. A
Neuroethol. Sens. Neural Behav. Physiol. 173, 143149 (1993).
66. Fuller, S. B., Karpelson, M., Censi, A., Ma, K. Y. & Wood, R. J. Controlling free flight
of a robotic fly using an onboard vision sensor inspired by insect ocelli. J. R. Soc.
Interface 11, 20140281 (2014).
67. Gremillion, G., Humbert, J. S. & Krapp, H. G. Bio-inspired modeling and
implementation of the ocelli visual system of flying insects. Biol. Cybern. 108,
735746 (2014).
68. Ruffier, F. & Franceschini, N. Optic flow regulation in unsteady environments:
a tethered MAV achieves terrain following and targeted landing over a moving
platform. J. Intell. Robot. Syst. http://dx.doi.org/10.1007/s10846-014-0062-5
(2014).
69. Ruffier, F., Viollet, S., Amic, S. & Franceschini, N. Bio-inspired optical flow circuits
for the visual guidance of micro air vehicles. In Proc. International Symposium on
Circuits and Systems 3, 846849 (2003).
70. Duhamel, P.-E. J., Prez-Arancibia, N. O., Barrows, G. & Wood, R. J. Biologically
71.
72.
73.
74.
75.
76.
77.
78.
79.
80.
81.
82.
83.
84.
85.
86.
87.
88.
89.
90.
91.
92.
93.
94.
95.
96.
97.
98.
Acknowledgements D.F. and R.J.W. thank the Wyss Institute for Biologically
Inspired Engineering at Harvard University, where this Review was written. D.F.
also thanks the Swiss National Science Foundation through the National Centre of
Competence in Research Robotics.
Author Information Reprints and permissions information is available at
www.nature.com/reprints. The authors declare no competing financial
interests. Readers are welcome to comment on the online version of this
paper at go.nature.com/xukalu. Correspondence should be addressed to D.F.
(dario.floreano@epfl.ch).
4 6 6 | NAT U R E | VO L 5 2 1 | 2 8 M AY 2 0 1 5
REVIEW
doi:10.1038/nature14543
Conventionally, engineers have employed rigid materials to fabricate precise, predictable robotic systems, which are easily
modelled as rigid members connected at discrete joints. Natural systems, however, often match or exceed the performance of robotic systems with deformable bodies. Cephalopods, for example, achieve amazing feats of manipulation and
locomotion without a skeleton; even vertebrates such as humans achieve dynamic gaits by storing elastic energy in their
compliant bones and soft tissues. Inspired by nature, engineers have begun to explore the design and control of soft-bodied
robots composed of compliant materials. This Review discusses recent developments in the emerging field of soft robotics.
iology has long been a source of inspiration for engineers making ever-more capable machines1. Softness and body compliance
are salient features often exploited by biological systems, which
tend to seek simplicity and show reduced complexity in their interactions
with their environment2. Several of the lessons learned from studying
biological systems are now culminating in the definition of a new class
of machine that we, and others, refer to as soft robots36. Conventional,
rigid-bodied robots are used extensively in manufacturing and can be
specifically programmed to perform a single task efficiently, but often
with limited adaptability. Because they are built of rigid links and joints,
they are unsafe for interaction with humans. A common practice is to
separate human and robotic work spaces in factories to mitigate safety
concerns. The lack of compliance in conventional actuation mechanisms
is part of this problem. Soft robots provide an opportunity to bridge the
gap between machines and people. In contrast to hard-bodied robots, soft
robots have bodies made out of intrinsically soft and/or extensible materials (for example, silicone rubbers) that can deform and absorb much
of the energy arising from a collision. These robots have a continuously
deformable structure with muscle-like actuation that emulates biological systems and results in a relatively large number of degrees of freedom
compared with their hard-bodied counterparts. They (Fig. 1) have the
potential to exhibit unprecedented adaptation, sensitivity and agility. Soft
robots promise to be able to bend and twist with high curvatures and thus
can be used in confined spaces7; to deform their bodies in a continuous
way and thus achieve motions that emulate biology8; to adapt their shape
to the environment, employing compliant motion and thus manipulate
objects9, or move on rough terrain and exhibit resilience10; or to execute
rapid, agile manoeuvres, such as the escape manoeuvre in fish11.
The key challenge for creating soft machines that achieve their full
potential is the development of controllable soft bodies using materials
that integrate sensors, actuators and computation, and that together enable the body to deliver the desired behaviour. Conventional approaches
to robot control assume rigidity in the linkage structure of the robot and
are a poor fit for controlling soft bodies, thus soft materials require new
algorithms.
What is soft?
Soft refers to the body of the robot. Soft materials are the key enablers for creating soft robot bodies. Although Youngs modulus is
only defined for homogeneous, prismatic bars that are subject to
axial loading and small deformations, it is nonetheless a useful measure of the rigidity of materials used in the fabrication of robotic
systems5. Materials conventionally used in robotics (for example,
metals or hard plastics) have moduli in the order of 1091012 pascals, whereas natural organisms are often composed of materials
(for example, skin or muscle tissue) with moduli in the order of
104109Pa (orders of magnitude lower than their engineered counterparts; Fig.2). We define soft robots as systems that are capable of
autonomous behaviour, and that are primarily composed of materials with moduli in the range of that of soft biological materials.
The advantages of using materials with compliance similar to
that of soft biological materials include a considerable reduction
in the harm that could be inadvertently caused by robotic systems
(as has been demonstrated for rigid robots with compliant joints12),
increasing their potential for interaction with humans. Compliant
materials also adapt more readily to various objects, simplifying
tasks such as grasping13, and can also lead to improved mobility
over soft substrates14.
For the body of a soft robot to achieve its potential, facilities for
sensing, actuation, computation, power storage and communication
must be embedded in the soft material, resulting in smart materials.
In addition, algorithms that drive the body to deliver the desired
behaviours are required. These algorithms implement impedance
matching to the structure of the body. This tight coupling between
body and brain allows us to think about soft-bodied systems as
machines with mechanical intelligence, in which the body can be
viewed as augmenting the brain with morphological computation15,16. This ability of the body to perform computation simplifies
the control algorithms in many situations, blurring the line between
the body and the brain. However, soft robots (like soft organisms)
require control algorithms, which (at least for the foreseeable future)
will run on some sort of computing hardware. Although both body
and brain must be considered in concert, the challenges involved
are distinct enough that, in this Review, we find it useful to organize
them into separate sections.
In the following three sections we review recent developments in
the field of soft robotics as they pertain to design and fabrication,
computation and control, and systems and applications. We then
discuss persistent challenges facing soft robotics, and suggest areas
in which we see the greatest potential for societal impact.
Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, The Stata Center, Building 32, 32 Vassar Street, Cambridge, Massachusetts 02139, USA.
Department of Mechanical and Aerospace Engineering, University of California, San Diego, 9500 Gilman Drive, La Jolla, California 92093-0403.
2 8 M AY 2 0 1 5 | VO L 5 2 1 | NAT U R E | 4 6 7
INSIGHT REVIEW
Design and fabrication
Figure 1 | Mobile soft-robotic systems inspired by a range of biological systems. a, Caterpillar-inspired locomotion20. b, A multi-gait quadruped29. c,
Active camouflage35. d, Walking in hazardous environments10. e, Worm-inspired locomotion88. f, Particle-jamming-based actuation42. g, Rolling powered
by a pneumatic battery28. h, A hybrid hardsoft robot89. i, Snake-inspired locomotion8. j, Jumping powered by internal combustion58. k, Manta-ray inspired
locomotion100. l, An autonomous fish11.
4 6 8 | NAT U R E | VO L 5 2 1 | 2 8 M AY 2 0 1 5
104
105
106
108
109
1010
1011
Dia
mo
nd
Gla
ss
To
oth
en
am
el
Ste
el
Wo
o
Bo d
ne
Fib
reg
las
s
Ny
lon
Po
lye
thy
len
Mu
e(
scl
low
e
de
Ten
ns
ity
do
)
n
Ru
bb
er
Sk
in
Liv
er
tis
su
e
Po
lyd
im
eth
Ca
yls
rtil
ilo
ag
xa
e
ne
Ar
ter
y
ela
sto
me
r
Sil
ico
ne
Alg
ina
Fa te hy
t
dro
ge
l
REVIEW INSIGHT
1012
Figure 2 | Approximate tensile modulus (Youngs modulus) of selected engineering and biological materials. Soft robots are composed primarily of
materials with moduli comparable with those of soft biological materials (muscles, skin, cartilage, and so on), or of less than around 1 gigapascal. These
materials exhibit considerable compliance under normal loading conditions.
have characteristic moduli in the range of 105106Pa) impart minimal change on the impedance of the underlying structures. These
sensors generally have layered structures, in which multiple thin
elastomer layers are patterned with microfluidic channels by soft
lithography. The channels are subsequently filled with a liquid
conductor (for example, gallium-containing alloys such as eutectic
galliumindium, or EGaIn). With layered channel geometries, it is
possible to tailor sensors for measuring various strains including
tensile, shear50 or curvature51. To address the fabrication challenges
involved in injecting complex channel networks with liquid conductor, recent work has investigated mask deposition of the conductor52, or direct 3Dprinting of conductive material53. Alternatively,
exteroceptive sensing may be used to measure the curvature of a
soft robots body segments in real time30. To expand the applications
of soft robotics, compatible chemical and biological sensors54 may
be used to sense environmental signals. Such sensors may be more
compatible with soft robots than the optical and audio recorders
typically used in robotics.
Power sources
A big challenge for soft robots is stretchable, portable power
sources for actuation. For pneumatic actuators, existing fluidic
power sources are not soft and are usually big and bulky. Current
off-the-shelf pressure sources are generally limited to compressors
or pumps, and compressed air cylinders55. If we draw an analogy to
electrical systems, compressors are similar to generators as they convert electrical energy into mechanical energy, and compressed gas
cylinders are similar to capacitors as they store a pressurized fluid
in a certain volume to be discharged when required. Miniature compressors use valuable electrical energy inefficiently, and cylinders of
useful form factors do not offer longevity. What has been missing
for fluidic systems is the equivalent of a battery, whereby a chemical
reaction generates the necessary energy for actuation using a fuel.
The chemically operated portable pressure source, or pneumatic
battery28 (Fig.2g), generates pressurized gas using a hydrogen peroxide monopropellant56. Combustible fuels are another promising
high-energy-density chemical fuel source57,58.
Electrically powered actuators (as well as the electrical controllers
for pneumatic systems) require soft, flexible, lightweight electrical
power sources59. As with soft electronics, this is an active area of
research. Recent promising developments include batteries based on
graphene60, organic polymers61 and embedded conductive fabric62.
Design
Existing soft robotic systems have typically been designed with conventional 3D computer-aided design (CAD) software. However, current CAD software was not created with free-form 3D fabrication
processes in mind, and does not easily accommodate the complex
non-homogeneous 3D designs that may be desired for soft robotics. This has led researchers to either rely on relatively simple 2.5D
2 8 M AY 2 0 1 5 | VO L 5 2 1 | NAT U R E | 4 6 9
INSIGHT REVIEW
a
Fabrication
Recent progress in the field of soft robotics has been enabled by the
development of rapid digital design and fabrication tools. Researchers have manufactured complex soft-robotic systems by taking
advantage of rapid and adaptable fabrication techniques66, including multimaterial 3D printing67, shape deposition manufacturing
(SDM)68 and soft lithography69. These techniques can be combined
to create composites with heterogenous materials (for example,
rubber with different stiffness moduli), embedded electronics and
internal channels for actuation31,70. Direct digital printing with soft
materials may be another option for fabricating arbitrary soft structures. A variety of robot bodies can be fabricated using these techniques. The next section discusses control systems supporting a wide
range of soft-robot applications.
4 7 0 | NAT U R E | VO L 5 2 1 | 2 8 M AY 2 0 1 5
REVIEW INSIGHT
space. These approaches are robot-specific in that they integrate
the morphology of the robot and the characteristics of the actuation
system. One approach uses BernoulliEuler beam mechanics to predict deformation3,75,76; another develops a relationship between the
joint variables and the curvature arc parameters77, and this applies
to high and medium pressure robots; and a third presents models
that describe the deformation of robots actuated with low pressures11,28,29. These models are the basis for the forward kinematics of
their respective systems. However, the PCC model does not capture
all aspects of soft robots and, in an effort to increase the envelope of
the model, non-constant curvature models have been developed78.
The inverse-kinematics problem, as posed in ref. 75 (computing
the curvatures required to place a specified point on the robot body
at a desired location), is more challenging. Existing literature has
provided several approaches for semi-soft robots77,79. A limitation
of existing approaches to solving the inverse-kinematics problem for
linear soft bodies (for example, arms) is that currently neither the
whole body, nor the pose of the end effector (which may be important for manipulation, sensing, and so on), are not considered in the
solution. Autonomous obstacle avoidance and movement through a
confined environment are difficult without a computational solution
to the inverse-kinematics problem that is aware of the whole body of
the robot in space. Real-time, closed-loop curvature controllers are
required that drive the bending of the soft pneumatic body segments
of the manipulator despite their high compliance and lack of kinematic constraints. One method for closed-loop curvature control
of a soft-bodied planar manipulator30 uses the PCC assumption with
a cascaded curvature controller. An alternative visual servo control
approach for cable-driven soft-robotic manipulators has also been
proposed80.
The inverse-kinematics algorithm enables taskspace planning
algorithms to autonomously position their end effector (or some
other part of their body) in free space; manoeuvre in confined environments; and grasp and place objects. These taskspace algorithms
require the planner to consider the entire robot body (the whole
arm, for example, not just the end effector for manipulators). Existing algorithms for whole-body control81 assume rigid body systems
and have not been extended to soft-bodied robots. For whole-body
control of soft robots, it is difficult to control the configuration
of the whole soft body owing to compliance. One computational
approach30 to whole-body planning for soft planar manipulators
considers both the primary task of advancing the pose of the end
effector of the soft robot, and the secondary task of positioning the
changing envelope of the whole robot in relation to the environment7,30.
Control
Researchers have used these models to develop new approaches to
low-level control, inverse kinematics, dynamic operations and planning for soft-robotic systems. An important aspect of these algorithms is the use of compliance when the robot interacts with its
environment. Compliance allows the robot to adapt its shape and
function to objects of unknown or uncertain geometry and is the
basis for new control and planning algorithms for soft robots in
the presence of uncertainly. For example, soft robots can execute
pick and place operations without precise positioning or accurate
geometric models of the object to be grasped44.
Low-level control for soft robots is by pressure control using pressure transducers or volume control using strain sensors. Pressure
control accommodates differences in actuator compliance. Volume
control is an avenue to configuration control and supports setting a maximum safe displacement limit. Most fluid-powered soft
robots use open-loop valve sequencing to control body-segment
actuation. Valve sequencing means that a valve is turned on for some
duration of time to pressurize the actuator and then turned off to
either hold or deflate it. Many soft-robotic systems use this control
Resting
Action
Tighter
Tension cables
Full of air
Air tube
P = 0
P > 0
Full of water
or air
Channel
P = 0
Strain-limited layer
P > 0
INSIGHT REVIEW
Caudal fin
Fluid elastomer actuator
Dorsal fin
Gear pump
Dive plane
Liquid
Ventral fin
Battery
Electronics housing
0 ms
50 mm
66 ms
133 ms
200 ms
Figure 5 | A soft robotic fish. Soft robotic fish physical prototype (top) and design schematic31 . Images from a high-speed video of the fish executing a C-turn
escape manoeuvre with elapsed time indicated11 (bottom).
In this section we review some of the systems that have been developed so far to address a variety of potential applications to locomotion, manipulation and humanmachine interaction.
Locomotion
Recent work has explored the modes of locomotion possible with (or
enabled by) soft bodies (Fig.1). Notably, studies of caterpillars have
led to soft-robotic systems20,73, as well as to a further understanding
of the control of motion in these animals74. An understanding of
worm biomechanics also led to a bioinspired worm design composed of flexible materials and actuated by shape-memory actuators
(SMAs)88, and an annelid-inspired robot actuated by dielectric elastomer22. A European initiative studied the biomechanics and control of the octopus to produce a soft-robotic prototype9,24. Likewise,
a self-contained, autonomous robotic fish actuated by soft fluidic
actuators could swim forward, turn and adjust its depth11,31 (Fig.5).
This robot can execute a C-turn escape manoeuvre in 100milliseconds, which is on a par with its biological counterpart, enabling it
to be used as an instrument (a physical model) for biological studies. Soft-robotics projects have also explored quadrupedal locomotion10,29, rolling28 and snake-like undulation8. Jumping has also been
achieved using internal combustion of hydrocarbon fuels to achieve
rapid energy release57,58.
Many of the exciting applications for soft robotics (such as searchand-rescue operations or environmental monitoring), require an
autonomous, mobile system. However, most experimental soft
robotic systems rely on power and/or control signals delivered
through pneumatic and/or electrical tethers. Since typical actuation power sources (for example, air compressors or batteries) are
relatively heavy (for example, 1.2kg10), tethers greatly simplify system design by significantly reducing the required carrying capacity.
One approach to achieving mobile systems is to tether a soft robot to
a mobile rigid robot with a greater carrying capacity89. Untethered
mobile systems have circumvented the challenge of carrying heavy
power sources by operating underwater11,31 or rolling on a surface8,28
such that actuation system masses do not have to be lifted against
gravity. Another approach has led to materials and designs tailored
to operate at working pressures that are high enough to carry large
payloads10.
Manipulation
Manipulation is one of the canonical challenges for robotics (Fig.3).
Soft systems have a natural advantage over rigid robots in grasping
and manipulating unknown objects because the compliance of soft
grippers allows them to adapt to a variety of objects with simple control schemes. Grippers that employ isothermal phase change owing
to granular jamming have taken advantage of this feature of soft systems44,90. Under-actuated grippers composed of silicone elastomers
with embedded pneumatic channels have also shown impressive
adaptability13,33. Commercially developed systems have also demonstrated manipulation with lightweight grippers composed of inflated
flexible (but not extensible) material91. As one of the more mature
applications of soft-robotic technology, companies have begun to
produce soft-robotic manipulators (for example, Pneubotics an Otherlab company, Empire Robotics and Soft Robotics).
Medical and wearable applications
One of the natural advantages of soft-robotic systems is the compatibility of their moduli with those of natural tissues for medical and
wearable applications. Rigid medical devices or orthoses run the risk
of causing damage or discomfort to human or animal tissue. In addition, it can be difficult to perfectly replicate the motion of natural
joints with rigid orthodics. One possibility is to integrate a degree
of compliance into wearable devices, for example for orthopaedic
rehabilitation. Recently, researchers have begun to look at medical, wearable applications for soft robotics, including soft wearable
4 7 2 | NAT U R E | VO L 5 2 1 | 2 8 M AY 2 0 1 5
REVIEW INSIGHT
input devices (for example, wearable keyboards92), soft orthodics for
anklefoot rehabilitation37, soft sensing suits for lower-limb measurement93, soft actuated systems for gait rehabilitation of rodents
who have had their spinal cord surgically cut94, and a soft system for
simulation of cardiac actuation95.
Soft cyborgs
Recent work has begun to investigate robotic systems that directly
integrate biological (as opposed to artificial, biologically compatible)
materials. Because biological materials are often very soft, the resulting systems are soft-robotic systems (or perhaps they would be more
aptly named soft cyborgs). For example, microbes that digest organic
material and produce electricity have powered artificial muscles for
autonomous robots96, and cardiomyocytes have been used to power
a jellyfish-inspired swimming cyborg97. One challenge with using
swarms of inexpensive soft robots for exploration is retrieving the
robots once the task is completed. One way to avoid this problem is
to develop biodegradable and soft robots powered by gelatin actuators98. Since gelatin is edible, there may also be medical applications
for this technology.
Future directions
The field of soft robotics aims to create the science and applications of
soft autonomy by asking the questions: how do we design and control soft
machines, and how do we use these new machines? Current research on
the algorithmic and device-level aspects of soft robots has demonstrated
soft devices that can locomote, manipulate, and interact with people
and their environment in unique ways. Soft mobile robots capable of
locomotion on unknown terrain and soft-robot manipulators capable
of pose-invariant and shape-invariant grasping rely on compliance to
mitigate uncertainty and adapt to their environment and task. These
basic behaviours open the door to applications in which robots work
closely with humans. For example, a soft-robot manipulator could handle and prepare a wide variety of food items in the kitchen while a soft
mobile robot could use contact to guide patients during physiotherapy
exercises. The soft home robot could assist the elderly with monitoring
their medicine regimens, whereas a soft factory robot could support
humans in delicate assembly tasks. But how do we get to the point where
soft robots deliver on their full potential? We need rapid design tools and
fabrication recipes for low-cost soft robots, novel algorithmic approaches
to the control of soft robots that account for their material properties,
tools for developing device-specific programming environments that
allow non-experts to use the soft machines, creative individuals to design
new solutions and early adopters of the technology.
The soft-robotics community is creating materials that have the ability
to compute, sense and move, leading to a convergence between materials
and machines. The materials are becoming more like computers owing
to embedded electro-mechanical components, and machines are getting softer owing to the alternative materials used to create them. This
convergence requires tools for functional specification and automated
co-design of the soft body (including all the components necessary for
actuation, sensing and computation) and the soft brain (including all
software aspects of controlling the device and reasoning about the world
and the goal task). Progress in new materials with programmable stiffness properties, proprioceptive sensing, contact modelling, and recipes
for rapid fabrication will enable the creation of increasingly more capable
soft machines.
But how soft should a soft robot be in order to meet its true potential?
As with many aspects of robotics, this depends on the task and environment. For domains that require extreme body compliance (for example,
inspecting a pipe with complex geometry or laproscopic surgery) or
dealing with a great amount of uncertainty (for example, locomotion
on rocky uneven terrain or grasping unknown objects), soft machines
can bring the capabilities of robots to new levels. However, soft robots
are difficult to model and control, especially when they need to retain a
desired body configuration without external support (for example, an
INSIGHT REVIEW
12. Albu-Schaffer, A. et al. Soft robotics. IEEE Robot. Automat. Mag. 15, 2030
(2008).
13. Deimel, R. & Brock, O. A novel type of compliant, underactuated robotic hand
for dexterous grasping. In Proc. Robotics: Science and Systems 16871692
(2014).
This paper describes the design and testing of an inexpensive, modular,
underactuated soft robot hand with pneumatically actuated fibre-reinforced
elastomer digits.
14. Majidi, C., Shepherd, R. F., Kramer, R. K., Whitesides, G. M. & Wood, R. J.
Influence of surface traction on soft robot undulation. Int. J. Robot. Res. 32,
15771584 (2013).
15. Paul, C. Morphological computation: a basis for the analysis of morphology
and control requirements. Robot. Auton. Syst. 54, 619630 (2006).
16. Hauser, H., Ijspeert, A. J., Fuchslin, R. M., Pfeifer, R. & Maass, W. Towards a
theoretical foundation for morphological computation with compliant bodies.
Biol. Cybern. 105, 355370 (2011).
17. Hannan, M. W. & Walker, I. D. Kinematics and the implementation of an
elephants trunk manipulator and other continuum style robots. J. Robot. Syst.
20, 4563 (2003).
18. Camarillo, D. B., Carlson, C. R. & Salisbury, J. K. Configuration tracking for
continuum manipulators with coupled tendon drive. IEEE Trans. Robot. 25,
798808 (2009).
19. McMahan, W. et al. Field trials and testing of the octarm continuum
manipulator. In Proc. IEEE International Conference on Robotics and Automation
23362341 (2006).
20. Lin, H., Leisk, G. & Trimmer, B. Soft robots in space: a perspective for soft
robotics. Acta Futura 6, 6979 (2013).
21. Correll, N., Onal, C., Liang, H., Schoenfeld, E. & Rus, D. in Experimental Robotics
227240 (Springer, 2014).
22. Jung, K. et al. Artificial annelid robot driven by soft actuators. Bioinspir. Biomim.
2, S42S49 (2007).
23. Calisti, M. et al. An octopus-bioinspired solution to movement and
manipulation for soft robots. Bioinspir. Biomim. 6, 036002 (2011).
24. Laschi, C. et al. Soft robot arm inspired by the octopus. Adv. Robot. 26,
709727 (2012).
25. Schulte, H. in The Application of External Power in Prosthetics and Orthotics
94115 (1961).
26. Chou, C.-P. & Hannaford, B. Measurement and modeling of McKibben
pneumatic artificial muscles. IEEE Trans. Robot. Automat. 12, 90102 (1996).
27. Suzumori, K., Iikura, S. & Tanaka, H. Applying a flexible microactuator to
robotic mechanisms. IEEE Control Syst. 12, 2127 (1992).
28. Onal, C. D., Chen, X., Whitesides, G. M. & Rus, D. Soft mobile robots with
on-board chemical pressure generation. In Proc. International Symposium on
Robotics Research 116 (2011).
29. Shepherd, R. F. et al. Multigait soft robot. Proc. Natl Acad. Sci. USA 108,
2040020403 (2011).
This paper describes the rapid design and fabrication of a soft robot body
capable of walking and undulating tethered to a pneumatic actuation
system.
30. Marchese, A. D., Komorowski, K., Onal, C. D. & Rus, D. Design and control of a
soft and continuously deformable 2D robotic manipulation system. In Proc.
IEEE International Conference on Robotics and Automation 21892196 (2014).
31. Katzschmann, R. K., Marchese, A. D. & Rus, D. Hydraulic autonomous soft
robotic fish for 3D swimming. In Proc. International Symposium on Experimental
Robotics 1122374 (2014).
32. Polygerinos, P., Wang, Z., Galloway, K. C., Wood, R. J. & Walsh, C. J. Soft robotic
glove for combined assistance and at-home rehabilitation. Robot. Auton. Syst.
http://dx.doi.org/10.1016/j.robot.2014.08.014 (2014).
This paper demonstrates a wearable soft robotic system, which augments
the grasping force of wearers for rehabilitation.
33. Ilievski, F., Mazzeo, A. D., Shepherd, R. F., Chen, X. & Whitesides, G. M. Soft
robotics for chemists. Angew. Chem. 123, 19301935 (2011).
This paper demonstrates the effective use of methods and materials from
chemistry and soft-materials science in the fabrication of soft robots.
34. Martinez, R. V. et al. Robotic tentacles with three-dimensional mobility based
on flexible elastomers. Adv. Mater. 25, 205212 (2013).
35. Morin, S. A. et al. Camouflage and display for soft machines. Science 337,
828832 (2012).
36. Mosadegh, B. et al. Pneumatic networks for soft robotics that actuate rapidly.
Adv. Funct. Mater. 24, 21632170 (2014).
37. Park, Y.-L. et al. Design and control of a bio-inspired soft wearable robotic
device for anklefoot rehabilitation. Bioinspir. Biomim. 9, 016007 (2014).
38. Bar-Cohen, Y. Electroactive Polymer (EAP) Actuators as Artificial Muscles: Reality,
Potential, and Challenges (SPIE, 2004).
39. Wallace, G. G., Teasdale, P. R., Spinks, G. M. & Kane-Maguire, L. A. Conductive
Electroactive Polymers: Intelligent Polymer Systems (CRC, 2008).
40. Cheng, N. G., Gopinath, A., Wang, L., Iagnemma, K. & Hosoi, A. E. Thermally
tunable, self-healing composites for soft robotic applications. Macromol. Mater.
Eng. 299, 12791284 (2014).
41. Shan, W., Lu, T. & Majidi, C. Soft-matter composites with electrically tunable
elasticrigidity. Smart Mater. Struct. 22, 085005 (2013).
42. Steltz, E., Mozeika, A., Rodenberg, N., Brown, E. & Jaeger, H. M. JSEL: jamming
skin enabled locomotion. In Proc. International Conference on Intelligent Robots
and Systems 56725677 (2009).
4 7 4 | NAT U R E | VO L 5 2 1 | 2 8 M AY 2 0 1 5
REVIEW INSIGHT
kinematics for continuum manipulators. Adv. Robot. 23, 20772091 (2009).
80. Wang, H. et al. Visual servo control of cable-driven soft robotic manipulator. In
Proc. International Conference on Intelligent Robots and Systems 5762 (2013).
81. Khatib, O., Sentis, L., Park, J. & Warren, J. Whole body dynamic behavior and
control of human-like robots. Int. J. Humanoid Robot. 1, 2943 (2004).
82. Napp, N., Araki, B., Tolley, M. T., Nagpal, R. & Wood, R. J. Simple passive valves
for addressable pneumatic actuation. In Proc. International Conference on
Robotics and Automation 14401445 (2014).
83. Chirikjian, G. S. Hyper-redundant manipulator dynamics: a continuum
approximation. Adv. Robot. 9, 217243 (1994).
84. Yekutieli, Y. et al. Dynamic model of the octopus arm. I. biomechanics of the
octopus reaching movement. J. Neurophysiol. 94, 14431458 (2005).
This paper describes a 2D dynamic model for a soft manipulator based on
muscular hydrostats.
85. Snyder, J. & Wilson, J. Dynamics of the elastic with end mass and follower
loading. J. Appl. Mech. 57, 203208 (1990).
86. Tatlicioglu, E., Walker, I. D. & Dawson, D. M. Dynamic modeling for planar
extensible continuum robot manipulators. In Proc. International Conference on
Robotics and Automation 13571362 (2007).
87. Luo, M., Agheli, M. & Onal, C. D. Theoretical modeling and experimental
analysis of a pressure-operated soft robotic snake. Soft Robotics 1, 136146
(2014).
88. Seok, S. et al. Meshworm: a peristaltic soft robot with antagonistic nickel
titanium coil actuators. IEEE/ASME Trans. Mechatronics 18, 14851497 (2013).
89. Stokes, A. A., Shepherd, R. F., Morin, S. A., Ilievski, F. & Whitesides, G. M. A
hybrid combining hard and soft robots. Soft Robotics 1, 7074 (2014).
90. Amend, J. R., Brown, E. M., Rodenberg, N., Jaeger, H. M. & Lipson, H. A positive
pressure universal gripper based on the jamming of granular material. IEEE
Trans. Robot. 28, 341350 (2012).
91. Sanan, S., Lynn, P. S. & Griffith, S. T. Pneumatic torsional actuators for inflatable
robots. J. Mech. Robot. 6, 031003 (2014).
92. Kramer, R. K., Majidi, C. & Wood, R. J. Wearable tactile keypad with stretchable
2 8 M AY 2 0 1 5 | VO L 5 2 1 | NAT U R E | 4 7 5
REVIEW
doi:10.1038/nature14544
Evolution has provided a source of inspiration for algorithm designers since the birth of computers. The resulting field,
evolutionary computation, has been successful in solving engineering tasks ranging in outlook from the molecular to the
astronomical. Today, the field is entering a new phase as evolutionary algorithms that take place in hardware are developed,
opening up new avenues towards autonomous machines that can adapt to their environment. We discuss how evolutionary computation compares with natural evolution and what its benefits are relative to other computing approaches, and
we introduce the emerging area of artificial evolution in physical systems.
he proposal that evolution could be used as a metaphor for problem solving came with the invention of the computer1. In the
1970s and 1980s the principal idea was developed into different
algorithmic implementations under names such as evolutionary programming2, evolution strategies3,4 and genetic algorithms5, followed
later by genetic programming6. These branches merged in the 1990s,
and in the past 20years so-called evolutionary computation or evolutionary computing has proven to be highly successful across a wide
range of computational tasks in optimization, design and modelling79.
For instance, urgent needs in the development of low-cost thin-film
photovoltaic technologies were addressed using genetic algorithms
for topology optimization10. This led to highly efficient light-trapping
structures that exhibited more than a threefold increase over a classic
limit, and achieved efficiency levels far beyond the reach of intuitive
designs. It has also been convincingly demonstrated that evolutionary
approaches are powerful methods for knowledge discovery. For example, equations were evolved to model motion-tracking data captured
from various physical systems, ranging from simple harmonic oscillators to chaotic double pendula. This approach discovered, with limited
prior knowledge of physics, kinematics or geometry, several laws of
geometric and momentum conservation, and uncovered the alphabet
used to describe those systems11.
From the perspective of the underlying substrate in which the evolution takes place, the emergence of evolutionary computation can be
considered as a major transition of the evolutionary principles from
wetware, the realm of biology, to software, the realm of computers.
Today the field is at an exciting stage. New developments in robotics and
rapid prototyping (3D printing) are paving the way towards a second
major transition: from software to hardware, going from digital evolutionary systems to physical ones12,13.
In this Review we outline the working principles of evolutionary algorithms, and briefly discuss the differences between artificial and natural
evolution. We illustrate the power of evolutionary problem solving by
discussing a number of successful applications, reflect on the features
that make evolutionary algorithms so successful, review the current
trends of the field, and give our perspective on future developments.
VU University Amsterdam, de Boelelaan 1081a, 1081HV Amsterdam, the Netherlands. 2University of the West of England, Bristol BS16 1QY, UK.
4 7 6 | NAT U R E | VO L 5 2 1 | 2 8 M AY 2 0 1 5
REVIEW INSIGHT
problem-dependent phenotypes can be mapped to one of the standard
genotypes. From that point on, freely available evolutionary algorithm
machinery can be used.
It should be noted that just because an algorithm is formally suitable, it does not necessarily mean it will be successful. Suitability only
means that the evolutionary algorithm is capable of searching through
the space of possible solutions of a problem, but gives no guarantees that
this search will be either effective or efficient.
Initialization
Variation
Evaluation
From a historical perspective, humans have had two roles in evolution. Just like any other species, humans are the product of, and are
subject to, evolution. But for millennia (in fact, for about twice as
long as we have used wheels) people have also actively influenced
the course of evolution in other species by choosing which plants
or animals should survive or mate. Thus humans have successfully
exploited evolution to create improved food sources14 or more useful
animals15, even though the mechanisms involved in the transmission
of traits from one generation to the next were not understood.
Historically, the scope of human influence in evolution was very
limited, being restricted to interfering with selection for survival and
reproduction. Influencing other components, such as the design of
genotypes, or mutation and recombination mechanisms, was far
beyond our reach. This changed with the invention of the computer,
which provided the possibility of creating digital worlds that are
very flexible and much more controllable than the physical reality
we live in. Together with the increased understanding of the genetic
mechanisms behind evolution, this brought about the opportunity
to become active masters of evolutionary processes that are fully
designed and executed by human experimenters from above.
It could be argued that evolutionary algorithms are not faithful
models of natural evolution (Table1). However, they certainly are a
form of evolution. As Dennett16 said If you have variation, heredity,
and selection, then you must get evolution.
From a computer-science perspective, evolutionary algorithms are
Selection
Termination
BOX 1
Bitstrings
1110011000
1010011001
Parent
Child
Permutations
abcdefghijk
agcdefbhijk
Parent
Child
Real-valued vectors
Parent
Child
Trees
&
&
=
x
if
if
>
5
Parent
Child
2 8 M AY 2 0 1 5 | VO L 5 2 1 | NAT U R E | 4 7 7
INSIGHT REVIEW
Table 1| Main differences between natural evolution and evolutionary algorithms
Natural evolution
Evolutionary algorithms
Fitness
Observed quantity: a posteriori effect of selection and reproduction (in the eye of
the observer).
Selection
Genotypephenotype mapping
Variation
Offspring are created from one (asexual reproduction) or two parents (sexual
reproduction). Horizontal gene transfer can accumulate genes from more
individuals.
Execution
Parallel, decentralized execution; birth and death events are not synchronized.
Population
approach not only discovered effective antenna designs, but could also
adjust designs quickly when requirements changed. One of these antennas was actually constructed and deployed on the ST5 spacecraft, thus
becoming the first computer-evolved hardware in space. This project
also demonstrates a specific advantage of evolutionary over manual
design. The evolutionary algorithms generated and tested thousands
of completely new solutions, many with unusual structures that expert
antenna designers would be unlikely to produce. Evolutionary algorithms have also been successful in many other aeronautical and aerospace engineering endeavours. Problems in this field typically have
highly complex search spaces and multiple conflicting objectives. Population-based methods such as evolutionary algorithms have proven
effective at meeting the challenges of this combination. In particular,
so-called multi-objective evolutionary algorithms change the selection
function to explicitly reward diversity, so that they discover and maintain high-quality solutions representing different trade-offs between
objectives technically, they approximate diverse segments of the
Pareto front28. Many examples can also be found in bioinformatics. For
instance, by mining the ChEMBL database (which contains bioactive
molecules with drug-like properties), a set of transformations of chemical structures was identified that were then used as the mutation operator in an automated drug-design application29. The results showed clear
benefits, particularly in accommodating multiple target profiles such as
desired polypharmacology. This nicely illustrates how other approaches,
or existing knowledge, can be easily co-opted or accommodated within
an evolutionary computing framework.
Numerical and combinatorial optimization are important application areas of evolutionary algorithms. Particularly challenging is blackbox optimization, where the nature of the objective function requires
numerical (rather than analytical) methods, and gradient information can only be approximated by sampling solutions. A systematic
experimental study compared mathematical programming and evolutionary computation of a range of synthetic black-box optimization
problems, which were allowed different amounts of computing time
and resources30. The results showed that mathematical programming
algorithms that were designed to provide quick progress in the initial stages of the search outperform evolutionary algorithms if the
maximum number of evaluations is low, but this picture changes if the
computational budget is increased. Ultimately, the study concludes that
an evolutionary algorithm, in particular BIPOP-CMA-ES, is able to
find the optimum of a broader class of functions, solve problems with
The hypothesis that embedding the principles of evolution within computer algorithms can create powerful mechanisms for solving difficult,
poorly understood problems is now supported by a huge body of evidence. Evolutionary problem solvers have proven capable of delivering
high-quality solutions to difficult problems in a variety of scientific and
technical domains, offering several advantages over conventional optimization and design methods.
One appealing example from the design domain concerns X-band
antennas for the NASA Space Technology 5 (ST5) spacecraft27. The
normal approach to this task is very time and labour intensive, relying heavily on expert knowledge. The evolutionary-algorithm-based
4 7 8 | NAT U R E | VO L 5 2 1 | 2 8 M AY 2 0 1 5
REVIEW INSIGHT
a higher precision and solve some problems faster. The power of evolution strategies (especially the very successful CMA-ES variants31) for
real-life black-box optimization problems from industry has been discussed extensively32. Evidence gathered from years of academic research
and development for industrial applications suggests that the niche for
evolution strategies is formed by optimization tasks with a very limited budget for how many solutions can have their fitness evaluated.
Although this finding is not in line with conventional wisdom within
the field, there is ample support for this proposal.
Machine learning and modelling is another prominent area in which
evolutionary algorithms have proved their power, especially as many
contemporary approaches would otherwise rely on (often crude) greedy
or local search algorithms to refine and optimize models. For example,
neuroevolutionary approaches use evolutionary algorithms to optimize
the structure, parameters, or both simultaneously, of artificial neural
networks33,34. In other branches of machine learning, using evolutionary
computing to design algorithms has been shown to be very effective as
an alternative to handcrafting them, for instance, for inducing decision
trees35. Furthermore, evolutionary algorithms have been applied to prediction problems. For instance, to tackle the problem of predicting the
tertiary structure of a protein, an algorithm was designed to evolve a key
component of automated predictors the function used to estimate a
structures energy36. State-of-the-art methods in protein-structure prediction are limited by assuming a linear combination of energy terms,
whereas a genetic programming method easily accommodates expressions based on a much richer syntax. The best energy function found
by the genetic programming algorithm provided significantly better
prediction guidance than conventional functions. The algorithm was
able to automatically discover the most and least useful energy terms,
without having any knowledge of how these terms alone are correlated
to the prediction error.
The design of controllers for physical entities, such as machinery
or robots, has proved to be another fruitful area. For example, control
strategies for operating container cranes were evolved using a physical crane to determine fitness values37. The evolution of controllers is
also possible in situ, for example in a population of robots during, and
not just before, their operational period38,39. Evolutionary robotics40 is
an especially challenging application area because of two additional
issues that other branches of evolutionary computing do not face: the
very weak and noisy link between controllable design details and the
target feature or features; and the great variety of conditions under
which a solution should perform well. Normally in evolutionary computing there is a three-step evaluation chain: genotype to phenotype
to fitness. For robots the chain is four-step: genotype to phenotype to
behaviour to fitness. In this four-step chain the robots morphology
and controller form the phenotype. However, it could be argued that
the behaviour should be considered as phenotype, because it is the
entity that is being evaluated. Furthermore, the behaviour depends
on many external factors, creating an unpredictable environment in
which the robot is expected to perform. Nevertheless, since the manual
design of an autonomous and adaptive mobile robot is extremely difficult, evolutionary approaches offer large potential benefits. These
include the possibility of continuous and automated design, manufacture and deployment of robots of very different morphologies and
control systems41. Several studies have demonstrated such benefits,
in which robot control systems that were automatically generated by
artificial evolution were comparatively simpler or more efficient than
those engineered using other design methods42. In all cases, robots
initially exhibited uncoordinated behaviour, but a few hundreds of
generations were sufficient to achieve efficient behaviours in a wide
range of experimental conditions.
Several state-of-the-art algorithms for applications across a great variety of problem domains are based on hybridizing evolutionary search
with existing algorithms, especially local search methods. This kind of
hybridization can be thought of as adding lifetime learning to the evolutionary process. Freed from the restrictions of natural evolution (such as
Twentieth century
Twenty-first century
Wetware
in vivo
Software
in silico
Hardware
in materio
Natural
evolution
Evolutionary
computation
Evolution of things
learned traits not being written back immediately to the genotype), and
being able to experiment with novel types of individual and social learning, the theory and practice of so-called memetic algorithms has become
an important topic in the field4345. Such hybrid algorithms can often find
good (or better) solutions faster than a pure evolutionary algorithm when
the additional method searches systematically in the vicinity of good
solutions, rather than relying on the more randomized search carried
out by mutation46,47. For example, the cell suppression problem (deciding which data cells to disclose in published statistical tables in order to
protect respondents confidentiality)48 was solved using a combination of
graph partitioning, linear programming and evolutionary optimization
of the sequence in which vulnerable cells were considered. This produced
methods that could protect published statistical tables at a size that was
several orders of magnitude greater than had previously been possible.
Memetic algorithms have obtained an eminent place among the best
approaches to solving really hard problems.
Although initially considerable scepticism surrounded evolutionary algorithms, over the past 20 years evolutionary computation has
grown to become a major field in computational intelligence79. As well
as solving hard problems in various application areas, the emphasis of
evolutionary algorithms on randomness as a source of variation has
been shown to have particular advantages: the lack of problem-specific
preconceptions and biases of the algorithm designer opens up the way
to unexpected original solutions that can even have artistic value4951
(http://endlessforms.com/). The perception of evolution as a problem
solver has broadened from seeing evolution as a heuristic algorithm for
(parametric) optimization to considering it to be a powerful approach
for (structural) design52,53.
In general, evolutionary algorithms have proven competitive in
solving hard problems in the face of challenging characteristics like
non-differentiability, discontinuities, multiple local optima, noise and
nonlinear interactions among the variables, especially if the computational budgets are sufficiently high. Evolution is a slow learner, but the
steady increase in computing power, and the fact that the algorithm is
inherently suited to parallelization, mean that more and more generations can be executed within practically acceptable timescales.
The performance of evolutionary algorithms has also been compared with that of human experts, and there is now substantial and
well-documented evidence of evolutionary algorithms producing measurably human-competitive results54. The annual Humies competition
(http://www.genetic-programming.org/combined.php), which rewards
human-competitive results from evolutionary computation, highlights
the great variety of hard problems for which evolutionary algorithms
have delivered excellent solutions.
The success and popularity of evolutionary algorithms can be attributed to a number of algorithmic features, which makes them attractive.
First, they are assumption free because applying an evolutionary algorithm consists of specifying the representation for candidate solutions
and providing an external function that first transforms the genotype
into a candidate solution and then provides an evaluation. Internally,
2 8 M AY 2 0 1 5 | VO L 5 2 1 | NAT U R E | 4 7 9
INSIGHT REVIEW
evolutionary algorithms make no explicit assumptions about the
problem, hence they are widely applicable and easily transferable at
low cost. Second, they are flexible because they can be easily used in
collaboration with existing methods, such as local search; they can be
incorporated within, or make use of, existing toolsets; and combinations with domain-specific methods often lead to superior solvers
because they can exploit the best features of different approaches.
Third, they are robust, owing to the use of a population and randomized choices, which mean evolutionary algorithms are less likely to
get trapped in suboptimal solutions than other search methods. They
are also less sensitive to noise or infidelity in the models of the system
used to evaluate solutions, and can cope with changes in the problem.
Fourth, they are not focussed on a single solution because having
a population means that an algorithm terminates with a number of
solutions. Thus, users do not have to pre-specify their preferences and
weighting in advance, but can make decisions after they see what is
possible to achieve. This is a great advantage for problems with many
local optima, or with a number of conflicting objectives. Finally, they
are capable of producing unexpected solutions because they are blind
to human preconceptions and so can find effective, but non-intuitive
solutions, which are often valuable in design domains.
The theoretical underpinning of evolutionary algorithms remains a
hard nut to crack. Mathematical analysis can illuminate some properties,
but even digital evolutionary processes exhibit very complex dynamics
that allow only limited theory forming, despite the diverse set of tools
and methods ranging from quantitative genetics to statistical physics55.
One important theoretical result is the no free lunch theorem. This states
that evolutionary algorithms are not generic super solvers but neither
is any other method, because there is no such thing56. Instead, an evolutionary algorithm is the second best solver for any problem, meaning
that in many cases a carefully hand-crafted solver that exploits problem
characteristics is superior for the problem at hand, but that it might take
years to create that solver. A long-standing issue for theorists is algorithm
convergence. Early results were based on Markov-chain analysis and
addressed convergence in general57, but more recent work found specific relationships between algorithmic set-up and expected run times58.
Despite all the difficulties, the field is making progress in theory59,60.
algorithmic advances, this has spurred renewed interest in interactive evolutionary algorithms, which have been successfully applied to
elicit user preferences and knowledge in many areas from design to
art51 (http://picbreeder.org). Results suggest a useful synergy with periodic user interaction to incorporate preferences that help to focus the
search down to a more manageable set of dimensions78. Importantly,
this involves eliciting user preferences in response to what is discovered
to be possible, rather than a priori.
Generative and developmental representations
Further to the conventionally simple genotypephenotype mappings,
the use of indirect encodings is gaining traction. Such generative
and developmental representations allow the reuse of code, which
helps to scale up the complexity of artificially evolved phenotypes, for
instance, in evolutionary robotics, artificial life and morphogenetic
engineering7983.
Outlook
4 8 0 | NAT U R E | VO L 5 2 1 | 2 8 M AY 2 0 1 5
REVIEW INSIGHT
deficiency of software models, which lack the richness of matter that is
a source of challenges and opportunities not yet matched in artificial
algorithms96. Hence, they can provide new insights into fundamental
issues such as the factors influencing evolvability, resilience, the rate of
progress under various circumstances, or the co-evolution of mind and
body. Using a non-biochemical substrate for such research is becoming
technologically ever more feasible, and it increases the generalizability
of the findings. In particular, using a different medium for evolutionary
studies can separate generic principles and ground truth from effects
that are specific for carbon-based life as we know it.
Received 18 December 2014; accepted 24 March 2015.
1. Turing, A. M. in Machine Intelligence 5 (eds Meltzer, B. & Michie, D.) (Edinburgh
Univ. Press, 1969).
2. Fogel, L. Owens, A. J. & Walsh. M. J. Artificial Intelligence Through Simulated
Evolution (Wiley, 1966).
3. Rechenberg, I. Evolutionstrategie: Optimierung Technisher Systeme nach
Prinzipien des Biologischen Evolution [in German] (Fromman-Hozlboog, 1973).
4. Schwefel, H.-P. Numerical Optimization of Computer Models (Birkhuser, 1977).
5. Holland, J. H. Adaption in Natural and Artificial Systems (Univ. Michigan Press,
1975).
6. Koza, J. R. Genetic Programming (MIT Press, 1992).
7. Eiben, A. E. & Smith, J. E. Introduction to Evolutionary Computing (Springer,
2003).
8. Ashlock, D. Evolutionary Computation for Modeling and Optimization (Springer,
2006).
9. De Jong, K. Evolutionary Computation: a Unified Approach (MIT Press, 2006).
10. Wang, C., Yu, S., Chen, W. & Sun, C. Highly efficient light-trapping structure
design inspired by natural evolution. Sci. Rep. 3, 1025 (2013).
11. Schmidt, M. & Lipson, H. Distilling free-form natural laws from experimental
data. Science 324, 8185 (2009).
This paper provides a forceful demonstration of the power of evolutionary
methods for tasks that are thought to require highly educated scientists to
perform.
12. Eiben, A. E., Kernbach, S. & Haasdijk, E. Embodied artificial evolution: artificial
evolutionary systems in the 21st Century. Evol. Intel. 5, 261272 (2012).
13. Eiben, A. E. in Parallel Problem Solving from Nature PPSNXII (eds Filipic, B.,
Bartz-Beielstein, T. Branke, J. & Smith, J.) 2439 (Springer, 2014).
14. Piperno, D. R., Ranere, A. J., Holst, I., Iriarte, J. & Dickau, R. Starch grain and
phytolith evidence for early ninth millennium B.P. maize from the Central
Balsas River Valley, Mexico. Proc. Natl Acad. Sci. USA 106, 50195024 (2009).
15. Akey, J. M. et al. Tracking footprints of artificial selection in the dog genome.
Proc. Natl Acad. Sci. USA 107, 11601165 (2010).
16. Dennett, D. Darwins Dangerous Idea (Penguin, 1995).
17. Goldberg, D. Genetic Algorithms in Search, Optimization, and Machine Learning
(Addison-Wesley, 1989).
18. Fogel, D.B. Evolutionary Computation (IEEE, 1995).
19. Schwefel, H.-P. Evolution and Optimum Seeking (Wiley, 1995).
20. Bck, T. Evolutionary Algorithms in Theory and Practice (Oxford Univ. Press,
1996).
21. Banzhaf, W., Nordin, P., Keller, R. E. & Francone, F. D. Genetic Programming: an
Introduction (Morgan Kaufmann, 1998).
22. Storn, R. & Price, K. Differential evolution a simple and efficient heuristic
for global optimization over continuous spaces. J. Glob. Optim. 11, 341359
(1997).
23. Price, K. V., Storn, R. N. & Lampinen, J. A. Differential Evolution: a Practical
Approach to Global Optimization (Springer, 2005).
24. Kennedy, J. & Eberhart, R. C. Particle swarm optimization. In Proc. IEEE
International Conference on Neural Networks 19421948 (IEEE, 1995).
25. Kennedy, J. & Eberhart, R.C. Swarm Intelligence (Morgan Kaufmann, 2001).
26. De Jong, K. A. Are genetic algorithms function optimizers? In Proc. 2nd
Conference on Parallel Problem Solving from Nature (eds Manner, R. &
Manderick, B.) 313 (North-Holland, 1992).
27. Hornby, G. S., Lohn, J. D. & Linden, D. S. Computer-automated evolution of an
X-band antenna for NASAs space technology 5 mission. Evol. Comput. 19, 123
(2011).
28. Arias-Montano, A., Coello, C. A. C. & Mezura-Montes, E. Multiobjective
evolutionary algorithms in aeronautical and aerospace engineering. IEEE Trans.
Evol. Comput. 16, 662694 (2012).
29. Besnard, J. et al. Automated design of ligands to polypharmacological profiles.
Nature 492, 215220 (2012).
30. Posk, P. Huyer, W. & Pal, L. A comparison of global search algorithms for
continuous black box optimization. Evol. Comput. 20, 509541 (2012).
31. Hansen, N. & Ostermeier, A. Completely derandomized self-adaptation in
evolution strategies. Evol. Comput. 9, 159195 (2001).
This article introduced the CMA-ES algorithm, widely regarded as the state of
the art in numerical optimization.
32. Bck, T., Foussette, C. & Krause, P. Contemporary Evolution Strategies (Springer,
2013).
33. Yao, X. Evolving artificial neural networks. Proc. IEEE 87, 14231447 (1999).
This landmark paper, which was the winner of the 2001 Institute of Electrical
and Electronics Engineers Donald G. Fink Prize Paper Award, brought
together different strands of research and drew attention to the potential
INSIGHT REVIEW
67. Serpell, M. & Smith, J. E. Self-adaption of mutation operator and probability for
permutation representations in genetic algorithms. Evol. Comput. 18, 491514
(2010).
68. Fialho, A., Da Costa, L., Schoenauer, M. & Sebag, M. Analyzing bandit-based
adaptive operator selection mechanisms. Ann. Math. Artif. Intell. 60, 2564
(2010).
69. Karafotias, G., Hoogendoorn, M. & Eiben, A. E. Parameter control in evolutionary
algorithms: trends and challenges. IEEE Trans. Evol. Comput. 19, 167187
(2015).
This is a recent follow up to ref. 61.
70. Jin, Y. A comprehensive survey of fitness approximation in evolutionary
computation. Soft Comput. 9, 312 (2005).
71. Jin, Y. Surrogate-assisted evolutionary computation: recent advances and
future challenges. Swarm Evol. Comput. 1, 6170 (2011).
72. Loshchilov, I., Schoenauer, M. & Sebag, M. Self-adaptive surrogate-assisted
covariance matrix adaptation evolution strategy. In Proc. Conference on Genetic
and Evolutionary Computation (eds Soule, T. & Moore, J. H.) 321328 (ACM,
2012).
73. Zaefferer, M. et al. Efficient global optimization for combinatorial problems. In
Proc. Conference on Genetic and Evolutionary Computation (eds Igel, C. & Arnold,
D. V.) 871878 (ACM, 2014).
74. Deb, K. Multi-objective Optimization Using Evolutionary Algorithms (Wiley, 2001).
75. Zhang, Q. & Li, H. MOEA/D: a multi-objective evolutionary algorithm based on
decomposition. IEEE Trans. Evol. Comput. 11, 712731 (2007).
76. Deb, K. & Jain, H. An evolutionary many-objective optimization algorithm
using reference-point-based nondominated sorting approach, part I: solving
problems with box constraints. IEEE Trans. Evol. Comput. 18, 577601 (2014).
77. Jain, H. & Deb, K. An evolutionary many-objective optimization algorithm using
reference-point based non-dominated sorting approach, part II: handling
constraints and extending to an adaptive approach. IEEE Trans. Evol. Comput.
18, 602622 (2014).
78. Branke, J., Greco, S., Slowinski, R. & Zielniewicz, P. Learning value functions in
interactive evolutionary multiobjective optimization. IEEE Trans. Evol. Comput.
19, 88102 (2015).
79. Stanley, K. O. Compositional pattern producing networks: a novel abstraction of
development. Genet. Program. Evolvable Mach. 8, 131162 (2007).
80. OReilly, U.-M. & Hemberg, H. Integrating generative growth and evolutionary
computation for form exploration. Genet. Program. Evolvable Mach. 8, 163186
(2007).
81. Clune, J., Stanley, K. O., Pennock, R. & Ofria, C. On the performance of indirect
encoding across the continuum of regularity. IEEE Trans. Evol. Comput. 15,
346367 (2011).
4 8 2 | NAT U R E | VO L 5 2 1 | 2 8 M AY 2 0 1 5
COMMENT
CHEMISTRY Tracing the
evolution of the lab, from
furnace to fume hood p.422
CONSERVATION Deforestation
soaring in the Amazon,
satellite data show p.423
TONY GARNER/BAE
STUART RUSSELL
Take a stand on
AI weapons
Professor of computer science,
University of California, Berkeley
BAE Systems Taranis drone has autonomous elements, but relies on humans for combat decisions.
Ethics of artificial
intelligence
Four leading researchers share their concerns
and solutions for reducing societal risks from
intelligent machines.
COMMENT
NASAs Robonaut 2 could be used in medicine and industry as well as space-station construction.
4 1 6 | NAT U R E | VO L 5 2 1 | 2 8 M AY 2 0 1 5
SABINE HAUERT
Shape the debate,
dont shy from it
Lecturer in robotics, University of
Bristol
Irked by hyped headlines that foster fear
or overinflate expectations of robotics and
artificial intelligence (AI), some researchers have stopped communicating with the
JOSEPH BIBBY/NASA
COMMENT
RUSS ALTMAN
Distribute AI
benefits fairly
Professor of bioengineering, genetics,
medicine and computer science,
Stanford University
Artificial intelligence (AI) has astounding
potential to accelerate scientific discovery
in biology and medicine, and to transform
health care. AI systems promise to help
make sense of several new types of data:
measurements from the omics such as
genomics, proteomics and metabolomics;
electronic health records; and digitalsensor monitoring of health signs.
Clustering analyses can define new
syndromes separating diseases that
were thought to be the same and unifying others that have the same underlying
defects. Pattern-recognition technologies
may match disease states to optimal treatments. For example, my colleagues and I
are identifying groups of patients who are
likely to respond to drugs that regulate the
immune system on the basis of clinical and
transcriptomic features.
2 8 M AY 2 0 1 5 | VO L 5 2 1 | NAT U R E | 4 1 7
Kirobo, Japans first robot astronaut, was deployed to the International Space Station in 2013.
MANUELA VELOSO
Embrace a robot
human world
Professor of computer science,
Carnegie Mellon University
Humans seamlessly integrate perception,
cognition and action. We use our sensors
to assess the state of the world, our brains
to think and choose actions to achieve
objectives, and our bodies to execute those
actions. My research team is trying to build
robots that are capable of doing the same
with artificial sensors (cameras, microphones and scanners), algorithms and actuators, which control the mechanisms.
4 1 8 | NAT U R E | VO L 5 2 1 | 2 8 M AY 2 0 1 5
CORBIS
COMMENT