You are on page 1of 50

CPE 520 Neural Networks Learning From Data Through Support Vector Machines

Liang Tian
tian@csee.wvu.edu Lane Department of Computer Science & Electrical Engineering West Virginia University November 16, 2004

Neural Networks BP Learning

P. Klinkhachorn. CpE520 Lecture Notes, CSEE Dept, West Virginia University.

BP Learning Procedure

P. Klinkhachorn. CpE520 Lecture Notes, CSEE Dept, West Virginia University.

Classifier

A. Moore. Lecture Notes, School of Computer Science, CMU, http://www.cs.cmu.edu/~awm/tutorials.

Classifier
MLP

Margin

SVM

Classifier
MLP stop training when all points are correctly classified

Decision surface may not be optimized Generalization error may not be minimized

Local Minima
MLP gradient descent learning non linear optimization local minima

S. Bengio. An Introduction to Statistical Machine Learning Neural Networks. IDIAP. Available at http://www.idiap.ch/~bengio May. 2003.

SVM Classification

R. Collobert. An Introduction to Statistical Machine Learning Support Vector Machines. IDIAP. Available at http://www.idiap.ch/~collober. Jan. 2003.

SVM Classification

R. Collobert. An Introduction to Statistical Machine Learning Support Vector Machines. IDIAP. Available at http://www.idiap.ch/~collober. Jan. 2003.

SVM Classification

R. Collobert. An Introduction to Statistical Machine Learning Support Vector Machines. IDIAP. Available at http://www.idiap.ch/~collober. Jan. 2003.

SVM Classification

Margin Maximization

Correct Classification
R. Collobert. An Introduction to Statistical Machine Learning Support Vector Machines. IDIAP. Available at http://www.idiap.ch/~collober. Jan. 2003.

SVM Classification
Classic non-linear optimization problem with inequality constraints

Solved by maximizing the dual variables

Lagrange function

Subject to constraints

SVM Classification
Solutions of Lagrange Multipliers i will determine parameters w and b

The final decision hyperplane is an

indicator function

Similar to weighted sum in MLP

SVM Classification
If it is non linear, its easier to separate two classes by projecting data into higher dimensional space.

R. Collobert. An Introduction to Statistical Machine Learning Support Vector Machines. IDIAP. Available at http://www.idiap.ch/~collober. Jan. 2003.

SVM Classification
Z-Space x1 x2 Input Space xi fi (x) b= w0 = bias

wi

y = Output y = S wi f(xi) +b y = w fT(x)

SVM Classification
Problem?
Computationally discouraging if dimensionality of Z-Space is very large Introducing Kernel Functions to simplify the computation

Kernel Function is in input space Bypass the high dimensionality of feature space Common Kernels are polynomial and Gaussian.

SVM Learning Example


Classic XOR Problem

SVM Learning Example

SVM Learning Example

SVM Learning Example

SVM Learning Example

SVM Learning Example


polynomial Kernel function K

SVM Learning Procedure


Step 1: Select the kernel function Step 2: Present inputs and desired outputs Step 3: Solve Lagrange Multipliers i through an optimization problem Step 4: Obtain decision indicator function

BP Learning Procedure

P. Klinkhachorn. CpE520 Lecture Notes, CSEE Dept, West Virginia University.

SVM vs. NN

This is a NN
V. Kecman. Learning and Soft Computing. MIT Press, Cambridge, MA, 2001. ISBN: 0-262-11255-8.

SVM vs. NN

This is a SVM
V. Kecman. Learning and Soft Computing. MIT Press, Cambridge, MA, 2001. ISBN: 0-262-11255-8.

SVM vs. NN

There is NO difference between structure HOWEVER Important difference in LEARNING !

SVM vs. NN
SVM is a novel type of machine learning algorithm developed by V. Vapnik.

SVM minimizes an upper bound on the generalization error.


Conventional neural networks only minimize

the error on the training data. A unique and global solution and avoid being trapped at local minima
[1] V. Vapnik. The Nature of Statistical Learning Theory. Springer, N.Y., 1995. ISBN: 0-387-94559-8.

SVM Applications
OCR

0.6%

Muller et al. An introduction to kernel-based learning algorithms, IEEE Trans. NN, 12(2), 2001, pp.181-201.

SVM Applications
DNA Data Analysis

Muller et al. An introduction to kernel-based learning algorithms, IEEE Trans. NN, 12(2), 2001, pp.181-201.

SVM Applications
Single-Class Classification

Tax and Duin, Outliers and data descriptions, Proceedings of the 7th Annual Conference of the Advanced School for Computing and Imaging, 2001. Pp. 234-241.

Two Types of Problems

Regression

Classification

S. Bengio. An Introduction to Statistical Machine Learning Neural Networks. IDIAP. Available at http://www.idiap.ch/~bengio May. 2003.

SVM Regression

V. Kecman. Learning and Soft Computing. MIT Press, Cambridge, MA, 2001. ISBN: 0-262-11255-8.

SVM Regression
Approximating the set of data of l pair of training pattern

The SVM model used for function approximation is:

where (x) is the high-dimensional feature space that is nonlinearly mapped from the input space x.

L. Tian and A. Noore, A novel approach for short-term load forecasting using support vector machines, International Journal of Neural Systems, vol. 14, no. 5, Oct. 2004.

SVM Regression
w and b can be estimated by minimizing the following regularized risk function

Vapnik's linear loss function with -insensitivity zone

L. Tian and A. Noore, A novel approach for short-term load forecasting using support vector machines, International Journal of Neural Systems, vol. 14, no. 5, Oct. 2004.

SVM Regression
is the weights vector norm, which is used to constrain the model structure capacity in order to obtain better generalization performance. C is the regularization constant, representing the trade-off between the approximation error and the model structure.

L. Tian and A. Noore, A novel approach for short-term load forecasting using support vector machines, International Journal of Neural Systems, vol. 14, no. 5, Oct. 2004.

SVM Regression
Minimizing risk objective function R

L. Tian and A. Noore, A novel approach for short-term load forecasting using support vector machines, International Journal of Neural Systems, vol. 14, no. 5, Oct. 2004.

SVM Regression
Then, the solution is given in the form:

Training examples with (i i*) 0 are support vectors


L. Tian and A. Noore, A novel approach for short-term load forecasting using support vector machines, International Journal of Neural Systems, vol. 14, no. 5, Oct. 2004.

SVM Regression
Lagrange Multipliers can be obtained by maximizing the form:

L. Tian and A. Noore, A novel approach for short-term load forecasting using support vector machines, International Journal of Neural Systems, vol. 14, no. 5, Oct. 2004.

Regression Application - 1 Short-Term Load Forecasting

L. Tian and A. Noore, A novel approach for short-term load forecasting using support vector machines, International Journal of Neural Systems, vol. 14, no. 5, Oct. 2004.

Regression Application - 2 Software Reliability Prediction

L. Tian and A. Noore, A novel approach for short-term load forecasting using support vector machines, International Journal of Neural Systems, vol. 14, no. 5, Oct. 2004.

Regression Application - 2 Software Reliability Prediction

Average Error = 1.20%

L. Tian and A. Noore, On-line software reliability prediction: An approach based on support vector machines, International Journal of Reliability, Quality and Safety Engineering, submitted and under revision.

Regression Application - 2 Software Reliability Prediction

L. Tian and A. Noore, On-line software reliability prediction: An approach based on support vector machines, International Journal of Reliability, Quality and Safety Engineering, submitted and under revision.

Parameter Selection

Cao and Tay, Support vector machine with adaptive parameters in financial time series forecasting, IEEE Trans. NN, 14(6), Nov. 2003, pp. 1506-1518.

Summary

Both NN and SVM learn from experimental data Both NN and SVM are universal approximator After learning, both NN and SVM have same

mathematical model, graphical representation

The only difference is the learning method.


NN gradient descent SVM solving quadratic programming

SVM Research Issues

Speeding up learning time when data is large. Chunking, using subset of data Optimization techniques improvement

Parameter selection and optimization Modified and adaptive SVM and some variations

References and Further Reading


[1] V. Vapnik. The Nature of Statistical Learning Theory. Springer, N.Y., 1995. ISBN: 0-387-94559-8. [2] S. Bengio. An Introduction to Statistical Machine Learning Neural Networks. IDIAP. Available at http://www.idiap.ch/~bengio May. 2003. [3] V. Kecman. Learning and Soft Computing. MIT Press, Cambridge, MA, 2001. ISBN: 0-26211255-8.

References and Further Reading


[4] R. Collobert. An Introduction to Statistical Machine Learning Support Vector Machines IDIAP. Available at http://www.idiap.ch/~collober Jan. 2003. [5] L. Tian, A. Noore. A Novel Approach for Short-Term Load Forecasting Using Support Vector Machines. International Journal of Neural Systems. Vol. 14, No. 5, Oct. 2004. [6] L. Tian, A. Noore. On-line Software Reliability Prediction: An Approach Based on Support Vector Machines. International Journal of Reliability, Quality and Safety Engineering. Submitted and under revision.

References and Further Reading


[7] V. Vapnik. Statistical Learning Theory. John Wiley & Sons, 1998, ISBN: 0-471-03003-1. [8] http://www.kernel-machines [9] http://www.support-vector.ws

Questions and Comments ?

Thank You !!

You might also like