You are on page 1of 19

1.

Review of Le ture 5
Di hotomies

Break point
2

2.5

3.5

4
0.5
1
1.5
2
2.5
3

Growth fun tion


mH(N )=

max

x1, ,xN X

|H(x1, , xN )|

Maximum # of di hotomies
x1 x2 x3



Learning From Data


Yaser S. Abu-Mostafa

California Institute of Te hnology

Le ture 6:

Theory of Generalization

Sponsored by Calte h's Provost O e, E&AS Division, and IST

Thursday, April 19, 2012

Outline

Proof that mH(N ) is polynomial

Proof that mH(N ) an repla e M

Learning From Data - Le ture 6

2/18

Bounding

mH(N )

To show:

mH(N ) is polynomial

We show:

mH(N ) a polynomial

Key quantity:

B(N, k): Maximum number of di hotomies on N points, with break point k

Learning From Data - Le ture 6

3/18

Re ursive bound on

B(N, k)

Consider the following table:

# of rows

B(N, k) = + 2

S1

S2+

S2
S2

Learning From Data - Le ture 6

x1 x2 . . . xN 1
+1 +1 . . .
+1
1 +1 . . .
+1

xN
+1
1

+1
1
+1
1

1
+1
1
1

...
...
...
...

1
1
+1
+1

1
+1
+1
+1

+1
1
+1
1

1
1
1
1

...
...
...
...

+1
1
+1
+1

+1
+1
1
1

+1 1 . . .
1 1 . . .

+1
1

1
1

..
..
..

..
..
..

..
..
..

..
..
..

..
..
..

4/18

Estimating

and

Fo us on x1, x2, , xN 1 olumns:

# of rows

+ B(N 1, k)

S1

S2+

S2
S2

Learning From Data - Le ture 6

x1 x2 . . . xN 1
+1 +1 . . .
+1
1 +1 . . .
+1

xN
+1
1

+1
1
+1
1

1
+1
1
1

...
...
...
...

1
1
+1
+1

1
+1
+1
+1

+1
1
+1
1

1
1
1
1

...
...
...
...

+1
1
+1
+1

+1
+1
1
1

+1 1 . . .
1 1 . . .

+1
1

1
1

..
..
..

..
..
..

..
..
..

..
..
..

..
..
..

5/18

Estimating

by itself

Now, fo us on the S2 = S2+ S2 rows:

# of rows

B(N 1, k 1)

S1

S2+

S2
S2

Learning From Data - Le ture 6

x1 x2 . . . xN 1
+1 +1 . . .
+1
1 +1 . . .
+1

xN
+1
1

+1
1
+1
1

1
+1
1
1

...
...
...
...

1
1
+1
+1

1
+1
+1
+1

+1
1
+1
1

1
1
1
1

...
...
...
...

+1
1
+1
+1

+1
+1
1
1

+1 1 . . .
1 1 . . .

+1
1

1
1

..
..
..

..
..
..

..
..
..

..
..
..

..
..
..

6/18

Putting it together

B(N, k) = + 2

# of rows

+ B(N 1, k)

S1

B(N 1, k 1)
S2+

B(N, k)
B(N 1, k) + B(N 1, k 1)
Learning From Data - Le ture 6

S2
S2

x1 x2 . . . xN 1
+1 +1 . . .
+1
1 +1 . . .
+1

xN
+1
1

+1
1
+1
1

1
+1
1
1

...
...
...
...

1
1
+1
+1

1
+1
+1
+1

+1
1
+1
1

1
1
1
1

...
...
...
...

+1
1
+1
+1

+1
+1
1
1

+1 1 . . .
1 1 . . .

+1
1

1
1

..
..
..

..
..
..

..
..
..

..
..
..

..
..
..

7/18

Numeri al omputation of

B(N, k)

bound

B(N, k) B(N 1, k) + B(N 1, k 1)


top

bottom

Learning From Data - Le ture 6

1
2
3
N 4
5
6
:

1
1
1
1
1
1
1
:

2
2
3
4
5
6
7
:

3
2
4
7
11
:
:
:

k
4
2
4
8
..
.

5
2
4
8
..

6
2
4
8
..

..
..
..
..
..

.
.
8/18

Analyti solution for

B(N, k)

bound

B(N, k) B(N 1, k) + B(N 1, k 1)


Theorem:

B(N, k)


k1
X
i=0

N
i

1. Boundary onditions:

Learning From Data - Le ture 6

easy

1
2
3
N 4
5
6
:

1
1
1
1
1
1
1
:

2
2
3
4
5
6
7
:

3
2
4
7
11
:
:
:

top

k
4
2
4
8
..
.

5
2
4
8
..

6
2
4
8
..

..
..
..
..
..

bottom

.
9/18

2. The indu tion step


top


k1
X
i=0

N
i

=
=
=
=

Learning From Data - Le ture 6


k1
X


k2
X

N 1
N 1
+
?
i
i
i=0
i=0




k1
k1
X N 1
X N 1
+
1+
i1
i
i=1
i=1




k1
X N 1
N 1
1+
+
i
i1
i=1
k1  
k1  
X
X
N
N
X
=
1+
i
i
i=0
i=1

k1

N1

N
bottom

10/18

It is polynomial!

For a given H, the break point k is xed

mH(N )


k1
X

N
i
|i=0 {z }

maximum power is

Learning From Data - Le ture 6

N k1

11/18

Three examples
k1  
X
N
i=0

H is

positive rays:

H is

positive intervals:

H is

(break point k = 2)
mH(N ) = N + 1 N + 1

2D per eptrons:

Learning From Data - Le ture 6

(break point k = 3)
mH(N ) = 12 N 2 + 12 N + 1
(break point k = 4)
mH(N ) = ?

1 3
6N

1 2
2N

+ 12 N + 1

+ 56 N + 1
12/18

Outline

Proof that mH(N ) is polynomial

Proof that mH(N ) an repla e M

Learning From Data - Le ture 6

13/18

What we want

Instead of:

P [ |E (g) E (g)| > ] 2


in

out

22N

We want:

P [ |E (g) E (g)| > ] 2 mH(N ) e

22N

in

Learning From Data - Le ture 6

out

14/18

Pi torial proof

How does mH(N ) relate to overlaps?


What to do about Eout?
Putting it together

Learning From Data - Le ture 6

15/18

Hoeffding Inequality

Union Bound

VC Bound

space of
data sets
.

(a)
Learning From Data - Le ture 6

(b)

(c)
16/18

What to do about

Eout

Hi

Eout(h)

Ein(h)

Eout(h)

Ein(h)
Ein(h)
Hi

Learning From Data - Le ture 6

17/18

Putting it together

Not quite:

2
2 N
P [ |E (g) E (g)| > ] 2 mH( N ) e

in

out

but rather:

P [ |E (g) E (g)| > ] 4 mH(2N )


in

out

1
8 2 N
e

The Vapnik-Chervonenkis Inequality


Learning From Data - Le ture 6

18/18

You might also like