Professional Documents
Culture Documents
Liang Tian
tian@csee.wvu.edu Lane Department of Computer Science & Electrical Engineering West Virginia University November 16, 2004
BP Learning Procedure
Classifier
Classifier
MLP
Margin
SVM
Classifier
MLP stop training when all points are correctly classified
Decision surface may not be optimized Generalization error may not be minimized
Local Minima
MLP gradient descent learning non linear optimization local minima
S. Bengio. An Introduction to Statistical Machine Learning Neural Networks. IDIAP. Available at http://www.idiap.ch/~bengio May. 2003.
SVM Classification
R. Collobert. An Introduction to Statistical Machine Learning Support Vector Machines. IDIAP. Available at http://www.idiap.ch/~collober. Jan. 2003.
SVM Classification
R. Collobert. An Introduction to Statistical Machine Learning Support Vector Machines. IDIAP. Available at http://www.idiap.ch/~collober. Jan. 2003.
SVM Classification
R. Collobert. An Introduction to Statistical Machine Learning Support Vector Machines. IDIAP. Available at http://www.idiap.ch/~collober. Jan. 2003.
SVM Classification
Margin Maximization
Correct Classification
R. Collobert. An Introduction to Statistical Machine Learning Support Vector Machines. IDIAP. Available at http://www.idiap.ch/~collober. Jan. 2003.
SVM Classification
Classic non-linear optimization problem with inequality constraints
Lagrange function
Subject to constraints
SVM Classification
Solutions of Lagrange Multipliers i will determine parameters w and b
indicator function
SVM Classification
If it is non linear, its easier to separate two classes by projecting data into higher dimensional space.
R. Collobert. An Introduction to Statistical Machine Learning Support Vector Machines. IDIAP. Available at http://www.idiap.ch/~collober. Jan. 2003.
SVM Classification
Z-Space x1 x2 Input Space xi fi (x) b= w0 = bias
wi
SVM Classification
Problem?
Computationally discouraging if dimensionality of Z-Space is very large Introducing Kernel Functions to simplify the computation
Kernel Function is in input space Bypass the high dimensionality of feature space Common Kernels are polynomial and Gaussian.
BP Learning Procedure
SVM vs. NN
This is a NN
V. Kecman. Learning and Soft Computing. MIT Press, Cambridge, MA, 2001. ISBN: 0-262-11255-8.
SVM vs. NN
This is a SVM
V. Kecman. Learning and Soft Computing. MIT Press, Cambridge, MA, 2001. ISBN: 0-262-11255-8.
SVM vs. NN
SVM vs. NN
SVM is a novel type of machine learning algorithm developed by V. Vapnik.
the error on the training data. A unique and global solution and avoid being trapped at local minima
[1] V. Vapnik. The Nature of Statistical Learning Theory. Springer, N.Y., 1995. ISBN: 0-387-94559-8.
SVM Applications
OCR
0.6%
Muller et al. An introduction to kernel-based learning algorithms, IEEE Trans. NN, 12(2), 2001, pp.181-201.
SVM Applications
DNA Data Analysis
Muller et al. An introduction to kernel-based learning algorithms, IEEE Trans. NN, 12(2), 2001, pp.181-201.
SVM Applications
Single-Class Classification
Tax and Duin, Outliers and data descriptions, Proceedings of the 7th Annual Conference of the Advanced School for Computing and Imaging, 2001. Pp. 234-241.
Regression
Classification
S. Bengio. An Introduction to Statistical Machine Learning Neural Networks. IDIAP. Available at http://www.idiap.ch/~bengio May. 2003.
SVM Regression
V. Kecman. Learning and Soft Computing. MIT Press, Cambridge, MA, 2001. ISBN: 0-262-11255-8.
SVM Regression
Approximating the set of data of l pair of training pattern
where (x) is the high-dimensional feature space that is nonlinearly mapped from the input space x.
L. Tian and A. Noore, A novel approach for short-term load forecasting using support vector machines, International Journal of Neural Systems, vol. 14, no. 5, Oct. 2004.
SVM Regression
w and b can be estimated by minimizing the following regularized risk function
L. Tian and A. Noore, A novel approach for short-term load forecasting using support vector machines, International Journal of Neural Systems, vol. 14, no. 5, Oct. 2004.
SVM Regression
is the weights vector norm, which is used to constrain the model structure capacity in order to obtain better generalization performance. C is the regularization constant, representing the trade-off between the approximation error and the model structure.
L. Tian and A. Noore, A novel approach for short-term load forecasting using support vector machines, International Journal of Neural Systems, vol. 14, no. 5, Oct. 2004.
SVM Regression
Minimizing risk objective function R
L. Tian and A. Noore, A novel approach for short-term load forecasting using support vector machines, International Journal of Neural Systems, vol. 14, no. 5, Oct. 2004.
SVM Regression
Then, the solution is given in the form:
SVM Regression
Lagrange Multipliers can be obtained by maximizing the form:
L. Tian and A. Noore, A novel approach for short-term load forecasting using support vector machines, International Journal of Neural Systems, vol. 14, no. 5, Oct. 2004.
L. Tian and A. Noore, A novel approach for short-term load forecasting using support vector machines, International Journal of Neural Systems, vol. 14, no. 5, Oct. 2004.
L. Tian and A. Noore, A novel approach for short-term load forecasting using support vector machines, International Journal of Neural Systems, vol. 14, no. 5, Oct. 2004.
L. Tian and A. Noore, On-line software reliability prediction: An approach based on support vector machines, International Journal of Reliability, Quality and Safety Engineering, submitted and under revision.
L. Tian and A. Noore, On-line software reliability prediction: An approach based on support vector machines, International Journal of Reliability, Quality and Safety Engineering, submitted and under revision.
Parameter Selection
Cao and Tay, Support vector machine with adaptive parameters in financial time series forecasting, IEEE Trans. NN, 14(6), Nov. 2003, pp. 1506-1518.
Summary
Both NN and SVM learn from experimental data Both NN and SVM are universal approximator After learning, both NN and SVM have same
Speeding up learning time when data is large. Chunking, using subset of data Optimization techniques improvement
Parameter selection and optimization Modified and adaptive SVM and some variations
Thank You !!