Professional Documents
Culture Documents
...recognizing objects
and faces.
Timeline:
Adversarial Classification Dalvi et al 2004: fool spam filter
Evasion Attacks Against Machine Learning at Test Time
Biggio 2013: fool neural nets
Szegedy et al 2013: fool ImageNet classifiers imperceptibly
Goodfellow et al 2014: cheap, closed form attack (Goodfellow 2016)
Turning Objects into Airplanes
(Goodfellow 2016)
Attacking a Linear Model
(Goodfellow 2016)
Not just for neural nets
Linear models
Logistic regression
Softmax regression
SVMs
Decision trees
Nearest neighbors
(Goodfellow 2016)
Adversarial Examples from Overfitting
O
O
x
x O
x O
x
(Goodfellow 2016)
Adversarial Examples from
Excessive Linearity
O
O
O O
x x
x O
x
(Goodfellow 2016)
Modern deep nets are very
Modern deeppiecewise
nets are very linear
(piecewise) linear
(Goodfellow 2016)
Google Proprietary
Nearly Linear Responses in Practice
Argument to softmax
(Goodfellow 2016)
Small inter-class distances
Clean Perturbation Corrupted
example example
(Goodfellow 2016)
Maps of Adversarial and Random
Cross-Sections
(Goodfellow 2016)
Maps of Random Cross-Sections
Adversarial examples
are not noise
(Goodfellow 2016)
Wrong almost everywhere
(Goodfellow 2016)
Adversarial Examples for RL
Signs of weights
(Goodfellow 2016)
Linear Models of ImageNet
(Goodfellow 2016)
RBFs behave more intuitively
(Goodfellow 2016)
Cross-model, cross-dataset
generalization
(Goodfellow 2016)
Cross-technique transferability
(Papernot 2016)
(Goodfellow 2016)
Transferability Attack
Target model with
unknown weights, Substitute model
Train your
machine learning mimicking target
own model
algorithm, training model with known,
set; maybe non- dierentiable function
dierentiable
(Papernot 2016)
(Goodfellow 2016)
Enhancing Transfer With
Ensembles
(Goodfellow 2016)
Adversarial Examples in the
Physical World
(Goodfellow 2016)
Universal
Universal approximator
approximator theorem
theorem
Neural
Neural netsnets
cancan represent
represent either
either function:
function:
Maximum
Maximum likelihood
likelihood doesnt
doesnt cause
cause them
them to
to learn
learn the right function. But we can
the right function. But we can fix that... fix that...
(Goodfellow 2016)
Google
Training on Adversarial Examples
100
Train=Clean, Test=Clean
Test misclassification rate
Train=Clean, Test=Adv
Train=Adv, Test=Clean
10 1
Train=Adv, Test=Adv
10 2
(Goodfellow 2016)
Adversarial Training of other
Models
Linear models: SVM / linear regression cannot learn
a step function, so adversarial training is less useful,
very similar to weight decay
(Goodfellow 2016)
Weaknesses Persist
(Goodfellow 2016)
Adversarial Training
Labeled as bird Still has same label (bird)
Decrease
probability
of bird class
(Goodfellow 2016)
Virtual Adversarial Training
Unlabeled; model New guess should
guesses its probably match old guess
a bird, maybe a plane (probably bird, maybe plane)
Adversarial
perturbation
intended to
change the guess
(Goodfellow 2016)
Text Classification with VAT
RCV1 Misclassification Rate
8.00
7.70
7.50
7.40
7.20
7.12
7.00 7.05
6.97
6.68
6.50
6.00
(Goodfellow 2016)
Conclusion
Attacking is easy
Defending is dicult
(Goodfellow 2016)
cleverhans
Open-source library available at:
https://github.com/openai/cleverhans
Built on top of TensorFlow (Theano support anticipated)
Standard implementation of attacks, for adversarial training
and reproducible benchmarks
(Goodfellow 2016)