Liang Tian - Learning From Data Through Support Vector Machines

CPE 520 Neural Networks Learning From Data Through Support Vector Machines
Liang Tian
tian@csee.wvu.edu Lane Department of Computer Science & Electrical Engineering West Virginia University November 16, 2004
Neural Networks BP Learning
P. Klinkhachorn. CpE520 Lecture Notes, CSEE Dept, West Virginia University.
BP Learning Procedure
Classifier
A. Moore. Lecture Notes, School of Computer Science, CMU, http://www.cs.cmu.edu/~awm/tutorials.
Classifier
MLP
Margin
SVM
Classifier
MLP stop training when all points are correctly classified

Decision surface may not be optimized Generalization error may not be minimized
Local Minima
MLP gradient descent learning non linear optimization local minima
S. Bengio. An Introduction to Statistical Machine Learning Neural Networks. IDIAP. Available at http://www.idiap.ch/~bengio May. 2003.
SVM Classification
R. Collobert. An Introduction to Statistical Machine Learning Support Vector Machines. IDIAP. Available at http://www.idiap.ch/~collober. Jan. 2003.
SVM Classification
SVM Classification
SVM Classification
Margin Maximization
Correct Classification
SVM Classification
Classic non-linear optimization problem with inequality constraints

Solved by maximizing the dual variables
Lagrange function
Subject to constraints
SVM Classification
Solutions of Lagrange Multipliers i will determine parameters w and b

The final decision hyperplane is an
indicator function
Similar to weighted sum in MLP
SVM Classification
If it is non linear, its easier to separate two classes by projecting data into higher dimensional space.
SVM Classification
Z-Space x1 x2 Input Space xi fi (x) b= w0 = bias
wi
y = Output y = S wi f(xi) +b y = w fT(x)
SVM Classification
Problem?
Computationally discouraging if dimensionality of Z-Space is very large Introducing Kernel Functions to simplify the computation
Kernel Function is in input space Bypass the high dimensionality of feature space Common Kernels are polynomial and Gaussian.
SVM Learning Example

Classic XOR Problem

polynomial Kernel function K
SVM Learning Procedure

Step 1: Select the kernel function Step 2: Present inputs and desired outputs Step 3: Solve Lagrange Multipliers i through an optimization problem Step 4: Obtain decision indicator function
BP Learning Procedure
SVM vs. NN
This is a NN
V. Kecman. Learning and Soft Computing. MIT Press, Cambridge, MA, 2001. ISBN: 0-262-11255-8.
SVM vs. NN
This is a SVM
SVM vs. NN
There is NO difference between structure HOWEVER Important difference in LEARNING !
SVM vs. NN
SVM is a novel type of machine learning algorithm developed by V. Vapnik.
SVM minimizes an upper bound on the generalization error.

Conventional neural networks only minimize
the error on the training data. A unique and global solution and avoid being trapped at local minima
[1] V. Vapnik. The Nature of Statistical Learning Theory. Springer, N.Y., 1995. ISBN: 0-387-94559-8.
SVM Applications
OCR
0.6%
Muller et al. An introduction to kernel-based learning algorithms, IEEE Trans. NN, 12(2), 2001, pp.181-201.
SVM Applications
DNA Data Analysis
Muller et al. An introduction to kernel-based learning algorithms, IEEE Trans. NN, 12(2), 2001, pp.181-201.
SVM Applications
Single-Class Classification
Tax and Duin, Outliers and data descriptions, Proceedings of the 7th Annual Conference of the Advanced School for Computing and Imaging, 2001. Pp. 234-241.
Two Types of Problems
Regression
Classification
S. Bengio. An Introduction to Statistical Machine Learning Neural Networks. IDIAP. Available at http://www.idiap.ch/~bengio May. 2003.
SVM Regression
SVM Regression
Approximating the set of data of l pair of training pattern
The SVM model used for function approximation is:
where (x) is the high-dimensional feature space that is nonlinearly mapped from the input space x.
L. Tian and A. Noore, A novel approach for short-term load forecasting using support vector machines, International Journal of Neural Systems, vol. 14, no. 5, Oct. 2004.
SVM Regression
w and b can be estimated by minimizing the following regularized risk function
Vapnik's linear loss function with -insensitivity zone
SVM Regression
is the weights vector norm, which is used to constrain the model structure capacity in order to obtain better generalization performance. C is the regularization constant, representing the trade-off between the approximation error and the model structure.
SVM Regression
Minimizing risk objective function R
SVM Regression
Then, the solution is given in the form:
Training examples with (i i*) 0 are support vectors

SVM Regression
Lagrange Multipliers can be obtained by maximizing the form:
Regression Application - 1 Short-Term Load Forecasting
Regression Application - 2 Software Reliability Prediction
Average Error = 1.20%
L. Tian and A. Noore, On-line software reliability prediction: An approach based on support vector machines, International Journal of Reliability, Quality and Safety Engineering, submitted and under revision.
L. Tian and A. Noore, On-line software reliability prediction: An approach based on support vector machines, International Journal of Reliability, Quality and Safety Engineering, submitted and under revision.
Parameter Selection
Cao and Tay, Support vector machine with adaptive parameters in financial time series forecasting, IEEE Trans. NN, 14(6), Nov. 2003, pp. 1506-1518.
Summary

Both NN and SVM learn from experimental data Both NN and SVM are universal approximator After learning, both NN and SVM have same
mathematical model, graphical representation
The only difference is the learning method.

NN gradient descent SVM solving quadratic programming
SVM Research Issues
Speeding up learning time when data is large. Chunking, using subset of data Optimization techniques improvement
Parameter selection and optimization Modified and adaptive SVM and some variations
References and Further Reading

[1] V. Vapnik. The Nature of Statistical Learning Theory. Springer, N.Y., 1995. ISBN: 0-387-94559-8. [2] S. Bengio. An Introduction to Statistical Machine Learning Neural Networks. IDIAP. Available at http://www.idiap.ch/~bengio May. 2003. [3] V. Kecman. Learning and Soft Computing. MIT Press, Cambridge, MA, 2001. ISBN: 0-26211255-8.

[4] R. Collobert. An Introduction to Statistical Machine Learning Support Vector Machines IDIAP. Available at http://www.idiap.ch/~collober Jan. 2003. [5] L. Tian, A. Noore. A Novel Approach for Short-Term Load Forecasting Using Support Vector Machines. International Journal of Neural Systems. Vol. 14, No. 5, Oct. 2004. [6] L. Tian, A. Noore. On-line Software Reliability Prediction: An Approach Based on Support Vector Machines. International Journal of Reliability, Quality and Safety Engineering. Submitted and under revision.

[7] V. Vapnik. Statistical Learning Theory. John Wiley & Sons, 1998, ISBN: 0-471-03003-1. [8] http://www.kernel-machines [9] http://www.support-vector.ws
Questions and Comments ?
Thank You !!

Liang Tian - Learning From Data Through Support Vector Machines

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Liang Tian - Learning From Data Through Support Vector Machines

Uploaded by

Copyright:

Available Formats

CPE 520 Neural Networks Learning From Data Through Support Vector Machines

Neural Networks BP Learning

P. Klinkhachorn. CpE520 Lecture Notes, CSEE Dept, West Virginia University.

P. Klinkhachorn. CpE520 Lecture Notes, CSEE Dept, West Virginia University.

A. Moore. Lecture Notes, School of Computer Science, CMU, http://www.cs.cmu.edu/~awm/tutorials.

Solved by maximizing the dual variables

The final decision hyperplane is an

Similar to weighted sum in MLP

y = Output y = S wi f(xi) +b y = w fT(x)

SVM Learning Example

SVM Learning Example

SVM Learning Example

SVM Learning Example

SVM Learning Example

SVM Learning Example

SVM Learning Procedure

P. Klinkhachorn. CpE520 Lecture Notes, CSEE Dept, West Virginia University.

There is NO difference between structure HOWEVER Important difference in LEARNING !

SVM minimizes an upper bound on the generalization error.

Conventional neural networks only minimize

Two Types of Problems

The SVM model used for function approximation is:

Vapnik's linear loss function with -insensitivity zone

Training examples with (i i*) 0 are support vectors

Regression Application - 1 Short-Term Load Forecasting

Regression Application - 2 Software Reliability Prediction

Regression Application - 2 Software Reliability Prediction

Average Error = 1.20%

Regression Application - 2 Software Reliability Prediction

mathematical model, graphical representation

The only difference is the learning method.

NN gradient descent SVM solving quadratic programming

SVM Research Issues

References and Further Reading

References and Further Reading

References and Further Reading

Questions and Comments ?

You might also like