3 views

Original Title: KernelTrick (1).pdf

Uploaded by vazzoleralex6884

- RAW_Lab1_Repor_MATLABt.txt
- Class Kernels
- KernelTrick-LectureNotesPublic.pdf
- Yuan Fuqing
- 02e
- Mth 4377-6308, TIEQ 2
- sensors-14-21588.pdf
- E101 - Resolution of Vectors
- Lec-13
- Kernel Methods for Learning Languages
- Osu 1322581224
- Cambridge.Inverse.Theory.For.Petroleum.Reservoir.Characterization.And.History.Matching.pdf
- IJETTCS-2012-10-23-082
- m3260sp03sec45notes
- smart antenna
- CandesRombergTao_revisedNov2005
- IPS.pdf
- iv04-ped
- LN12_4.pdf
- Economics t

You are on page 1of 4

Laurenz Wiskott

first version 4 April 2006

last revision 12 May 2006

Contents

1 The

1.1

1.2

1.3

1.4

1.5

kernel trick

Nonlinear input-output functions . . . . . . . . . . . .

Inner product in feature space and the kernel function

Input-output function in terms of the kernel function .

Selecting the basis for the weight vector . . . . . . . .

Mercers theorem . . . . . . . . . . . . . . . . . . . . .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

1

1

2

3

3

3

1.1

If x denotes an input vector and k (x) with k = 1, ..., K is a set of fixed nonlinear scalar

functions in x, then

X

g(x) :=

wk k (x) = wT (x)

(1)

k

with real coefficients wk defines a nonlinear input-output function g(x). On the very right

I have used the more compact vector notation with w := (w1 , ..., wK )T and := (1 , ..., K )T .

Note, that not only x but also the k (x) can be regarded as vectors, which span a vector space

of all functions that can be expressed in the form of equation (1). This vector space is often

referred to as function space (D: Funktionenraum) and it has dimensionality K. Thus, is a

vector of vectors. For any concrete input vector x, (x) can be considered a simple vector

in a K-dimensional Euclidean vector space, which is often referred to as feature space (D:

Merkmalsraum). The process of going from x to (x) is referred to as a mapping (D: Abbildung)

from input space to feature space or as a nonlinear expansion (D: nichtlineare Erweiterung) (

Wiskott, 2006).

The dimensionality K of the feature space is usually much greater than the dimensionality of the input space. However, the manifold populated by expanded input vectors cannot

be higher-dimensional than the input-space. For instance, if the input space is two-dimensional and

feature space is three-dimensional, the two-dimensional manifold of input space gets embedded in

c 2006 Laurenz Wiskott

Currently at the Institute for Theoretical Biology at Humboldt-University Berlin and the Bernstein Center for

Computational Neuroscience Berlin, http://itb.biologie.hu-berlin.de/wiskott/.

the three-dimensional feature space. It can be folded and distorted, but you cannot make it truly

three-dimensional. Locally it will alway be two-dimensional (think of a towel that is rolled up).

For high-dimensional feature spaces, the mapping into and computations within feature

space can become computationally very expensive. However, some inner products in

particular feature spaces can be computed very efficiently without the need to actually

do the explicit mapping of the input vectors x into feature space.

1.2

and the feature space of polynomials

of degree two with the basis functions

1 := 1 ,

2 := 2 x1 , 3 := 2 x2 ,

4 := x21 , 5 := 2 x1 x2 , 6 := x22 .

(2)

(3)

(4)

Now, if we take the Euclidean inner product of the input vectors mapped into feature

space, we get

(x)T (

x) =

T

1, 2 x1 , 2 x2 , x21 , 2 x1 x2 , x22 1, 2 x

1 , 2 x

2 , x

21 , 2 x

1 x

2 , x

22

(5)

= 12 + 2x1 x

1 + 2x2 x

2 + x21 x

21 + 2x1 x2 x

1 x

2 + x22 x

22

2

(6)

= (1 + x1 x

1 + x2 x

2 )

(7)

(8)

= (1 + x x

) .

(Note that one would get the same result if one would drop the factors 2 in (3) and (4) but would

use a different inner product in feature space, namely z

z := z1 z1 +2z2 z2 +2z3 z3 +z4 z4 +2z5 z5 +z6 z6 .

Thus, there is some arbitrariness here in the way the nonlinear expansion and the inner product in

feature space are defined.)

Interestingly, this generalizes to arbitrarily high exponents, so that

(x)T (

x) = (1 + xT x

)d

(9)

is the inner product for a particular nonlinear expansion to polynomials of degree d. For d = 3 and

2 :=

1 := 1 ,

(10)

3 x1 , 3 := 3 x2 ,

4 :=

5 := 6 x1 x2 , 6 := 3 x22 ,

(11)

(x)T (

x) = (1 + xT x

)3 .

(14)

3 x21 ,

(12)

(13)

we get

Equation (9) offers a very efficient way of computing the inner product in feature

space, in particular for high degrees d. Because of the importance of the inner product between

two expanded input vectors we give it a special name and define the kernel function (D:

Kernfunktion)

k(x, x

) := (x)T (

x) .

(15)

Expressing everything in terms of inner products in feature space and using the kernel

function to efficiently compute these inner products is the kernel trick (D: Kern-Trick). It can

2

save a lot of computation and permits us to use arbitrarily high-dimensional feature spaces.

However, it requires that any computation in the feature space is formulated in terms of inner

products, including the nonlinear input-output function, as will be discussed in the next section.

The kernel function given above is just one example, there exist several others (M

uller et al.,

2001).

1.3

order to take advantage of the kernel trick, we have to rewrite (1). To do so we define w as a

linear combination of some expanded input vectors xj with j = 1, ..., J. We then get

X

w :=

j (xj )

(16)

j

g(x)

(1)

(16)

wT (x)

X

j (xj )T (x)

(17)

(18)

j

(15)

j k(xj , x) .

(19)

The number J of input vectors xj used to define w is usually taken to be much smaller

than the dimensionality K of the feature space, because otherwise the computational complexitiy of computing g(x) with the kernel trick (19) would be comparable to that of computing

it directly (1) and there would be no advantage in using the kernel trick. This, however, defines

a subspace of the feature space within which the optimization of g(x) takes place and one is back

again in not such a high-dimensional space. Thus, the kernel trick allows you to work in a potentially very high-dimensional space, but you have to choose a moderate-dimensional

subspace by selecting the basis for the weight vector before actually starting to optimize it. Selecting this basis can be an art in itself.

1.4

There are three possibilities to choose the basis for the weight vectors.

One simply takes all available data vectors x .

One selects a subset of the data vectors x with a good heuristics that predicts their usefulness.

One has a-priori ideas about what good basis vectors might be regardless of the data vectors.

1.5

Mercers theorem

It is interesting to note that since everything is done with the kernel function, it is not

even necessary to know the feature space and the inner product within it. It is only

required that the kernel function is such that a corresponding feature space and inner product

exists. There are criteria based on Mercers theorem that guarantee this existence, see (M

uller

et al., 2001). For instance, it is obvious that the kernel function must obey

k(x, x

) = k(

x, x)

(20)

k(x, x) 0

(21)

3

References

M

uller, K.-R., Mika, S., Ratsch, G., Tsuda, K., and Scholkopf, B. (2001). An introduction to

kernel-based learning algorithms. IEEE Trans. on Neural Networks, 12(2):181201.

Wiskott, L. (2006). Lecture notes on nonlinear expansion. http://itb.biologie.hu-berlin.de/

wiskott/Teaching/LectureNotes/NonlinearExpansion.pdf.

- RAW_Lab1_Repor_MATLABt.txtUploaded byRafay
- Class KernelsUploaded byJennifer Gonzalez
- KernelTrick-LectureNotesPublic.pdfUploaded byEdmond Z
- Yuan FuqingUploaded byLungani Sibindi
- 02eUploaded byVishnuDhanabalan
- Mth 4377-6308, TIEQ 2Uploaded bycody
- sensors-14-21588.pdfUploaded bypadmajasiva
- E101 - Resolution of VectorsUploaded byJanssen Alejo
- Lec-13Uploaded byAnonymous zzMfpoBx
- Kernel Methods for Learning LanguagesUploaded bySerge
- Osu 1322581224Uploaded byFrank Puk
- Cambridge.Inverse.Theory.For.Petroleum.Reservoir.Characterization.And.History.Matching.pdfUploaded bySandesh Chavan
- IJETTCS-2012-10-23-082Uploaded byAnonymous vQrJlEN
- m3260sp03sec45notesUploaded bynyx-stars
- smart antennaUploaded byVijaya Shree
- CandesRombergTao_revisedNov2005Uploaded byalmedeiroshotmail
- IPS.pdfUploaded bythangdalat
- iv04-pedUploaded bynitte5768
- LN12_4.pdfUploaded byJahanvi Barui
- Economics tUploaded bySayantan
- SolutionUploaded byGanix Amundarain
- Davenport-How Well Can We Estimate a Sparse VectorUploaded byjensen1d6359
- 02A-Vector Calculus Part1Uploaded byJDR JDR
- Wojtaszczyk 2000Uploaded byOmonda Nii
- AssignmentsUploaded byValan Pradeep
- Mock Lect Test 3Uploaded byLow Li Sin
- alglintransUploaded byCody Sage
- 9711162Uploaded bymlmilleratmit
- A01Uploaded bymy09
- 03 CFD NOTESUploaded byOsama Ibrahim

- Header & Piping SizingUploaded bymedicbest
- 60900921-Trocador-Casco-Tubo-2.xlsxUploaded byvazzoleralex6884
- Cálculo de Reatores Catalíticos Gás-sólido - Volume 1 – cinética e fenômenos de transferênciaUploaded byvazzoleralex6884
- 47034259-Planilha-para-Calculo-de-bomba-nao-afogada.xlsUploaded byvazzoleralex6884
- Planilha cálculo hidráulicoUploaded byclaudioraes
- Tabela de Calculos Para TamposUploaded byMD101067
- 101726935-Method-of-Heat-Exchanger-Sizing-Kern-Method.xlsxUploaded byvazzoleralex6884
- 60537696-Calculo-de-Vasos-Pressao-Tipo-2-tampo-conico-PRESSAO-NEGATIVA-rev3.xlsUploaded byvazzoleralex6884
- 80592259-Equacao-de-Antoine.xlsUploaded byvazzoleralex6884
- 68703279-Manhour-NSRP.xlsUploaded byvazzoleralex6884
- Problemas Destilacion Treybal(1)Uploaded byManuel Gonzalez
- 6161375-Fluxograma-Processo-Acucar-Alcool.xlsUploaded byvazzoleralex6884
- 58163989-Planilha-de-calculo-do-trocador-de-calor-ENGEPRA.xlsxUploaded byvazzoleralex6884
- Utilidade Industrial ( 35 PLANILHAS EM EXCEL viu)Uploaded byJSOPENTEADO
- 59608339-Calculo-Vasos-de-Pressao.xlsUploaded byvazzoleralex6884
- Cálculo de diâmetro rede ar comprimidoUploaded byamacedus
- Chemical Engg eBooks available at nominal chargesUploaded byinstrutech
- 74 - Cálculo eficiência das caldeiras (1)Uploaded byTeri Greene
- Cálculo Vasos De PressãoUploaded by084250
- 48240888-ASME-B31-3-Calculator-V2Uploaded byep_alfi
- 4A104C Excel-template -Heat-exchanger-Design p1 Si UnitsUploaded byMustafa Alweshahi
- A-AHPCorrect08.xlsUploaded byvazzoleralex6884
- 50375165-Air-Cooled-Condenser-Design-Spreadsheet.xlsUploaded byvazzoleralex6884
- 20725830-Drain-a-Tank-Calculation.xlsUploaded byvazzoleralex6884
- scrubber designUploaded bymayur1980
- Advanced_Excel_Formulas.xlsUploaded byLakiCamu
- 414CC3_excel-template_prelim-shell-and-tube-heat-exchanger-design_si_units.xlsUploaded byvazzoleralex6884
- Cálculo Sistema de VaporUploaded byapi-3764873
- 50378161-constantes-antoine.xlsUploaded byvazzoleralex6884
- 1252106057Bit HydraulicsTFA.xlsUploaded byvazzoleralex6884

- A support vector machine classifier with rough set-based feature selection for breast cancer diagnosisUploaded byAbdul Rahman
- MALIGNANT AND BENIGN BRAIN TUMOR SEGMENTATION AND CLASSIFICATION USING SVM WITH WEIGHTED KERNEL WIDTHUploaded bysipij
- Identifying Brain Tumour From Mri Image Using Modified Fcm and SupportUploaded bychaithra580
- Book-Machine Learning - The Complete GuideUploaded byrex_plantado
- Speech Emotion Recognition SystemUploaded byEditor IJRITCC
- A Critical Study of Selected Classification Algorithms for Liver Disease DiagnosisUploaded byMaurice Lee
- An Empirical Comparison Of Supervised Learning Algorithms In Disease DetectionUploaded byijitcs
- intro_svm_new_example.pdfUploaded bymikebites
- Deep Learning for Time Series ModelingUploaded byFelipe Angel
- ShalevSiSrCo2010_PEGPegasASOS.pdfUploaded bycarefree190
- SVM.pptx.pdf.pdfUploaded byAnkita Jain
- ijdps040309Uploaded byijdps
- Analyticsvidhya.com-Understanding Support Vector Machine Algorithm From Examples Along With CodeUploaded bysurajdhunna
- Context-Independent Claim Detection for Argument MiningUploaded byAlan Darmasaputra Chowanda
- SMOLA 1998 - Nonlinear Component Analysis as a Kernel Eigenvalue Problem. Neural ComputationUploaded byDouglas Diniz Landim
- AI_Instructional System Design(Final)Uploaded bythalafannan
- Understanding Support Vector Machine Algorithm From ExamplesUploaded bySahil Mehta
- Collins - Statistical Methods in Natural Language Processing (Slides)Uploaded bypriscian
- Kernel PCAUploaded byशुभमपटेल
- Special Elective Ph.DUploaded bysulthan_81
- Machine Learning class notes Andrew NgUploaded byDhruvkaran Mehta
- Machine Learning Algorithms - Summary + R CodeUploaded byamiba45
- Text Based Captcha Strengths and WeaknessesUploaded byjk_schnider4771
- Ben Ulmer, Matt Fernandez, Predicting Soccer Results in the English Premier LeagueUploaded byCosmin George
- The 5th Tribe CaretUploaded bySurya Prangga
- Ashish Gandhe,Restaurant Recommendation SystemUploaded byPriyansu Panda
- Representing Shape With a Spatial Pyramid KernelUploaded byPriyamvada Singh
- Support Vector Machine–Based Prediction System for a Football Match ResultUploaded byIOSRjournal
- support vector classificationUploaded byapi-285777244
- Kernel Learning Algorithms for Face Recognition [Li, Chu & Pan 2013-09-19]Uploaded byMario