Professional Documents
Culture Documents
Zahra Zojaji
Introduction
2
empirical risk
confidence interval
Empirical
risk
Confidence
interval
1.
2.
Overfiting
Problem
Keep the value of the empirical risk fixed (say equal to zero)
and minimize the confidence interval (i.e. SVM)
Sigmoid function
for example,
Where
n.
Where
denotes the sigmoid
function of vector:
as the vector coordinates transformed by the
sigmoid:
subconditions:
In explicit form:
I.
Backward pass:
II.
III.
condition
condition
Weight update
Remarks on BP
The sigmoid function has a scaling factor that affects the quality
of the approximation.
Satisfies 5.8
minimizes
Hyperplane
vectors x as follows:
that classifies
Theorem 5.1.
Corollary
Where
m
I.
II.
III.
where
class
x*(-1) : support vector belonging to the second
class
Subject to constraint
1.
Subject to
2.
Subject to
To construct polynomials of degree d << n in ndimensional space one needs more than
features.
5.6.1Generalization in high-Dimensional
Space
35
Theorem 5.2
Theorem 5.2
39
Constructing SV 5.6.3
Machines
41
Examples of SV 5.6.4
Machines
42
Examples of SV 5.6.4
Machines
43
Examples of SV 5.6.4
Machines
where
depends on the distance
between two vectors. The function
is
for any fixed a nonnegative monotonic
function; it tends to zero as z goes to infinity.
The most popular function of this type is
44
Examples of SV 5.6.4
Machines
I.
II.
III.
IV.
45
Examples of SV 5.6.4
Machines
46
Examples of SV 5.6.4
Machines
: sigmoid function
In contrast to kernels for polynomial machines
or for radial basis function machines that
always satisfy Mercer conditions, the sigmoid
kernel
satisfies Mercer
conditions only for some values of the
parameters v, c. For these values of the
parameters one can construct SV machines
implementing the rules
47
Examples of SV 5.6.4
Machines
II.
III.
49
50
Results
close is the upper bound on the error rate that this machine
achieves to the smallest possible?
53
REMARKS ON SV 5.8
MACHINES
54
REMARKS ON SV 5.8
MACHINES
SV machines minimize the upper bound on the error rate for the
structure given on a set of functions in a feature space. For the best
solution it is necessary that the vectors wi in (5.35) coincide with
support vectors. SV machines find the functions from the set (5.35)
that separate the training data and belong to the subset with the
smallest bound of the VC dimension.
II.
III.
where xi, i = 1,,n, are the coordinates of the vector x, and ai, bi are
given constants.
For this specific quadratic programming problem fast algorithms exist.
5.9.1Logistic Regression
55
Suppose
So
57
Subject to constraints:
62
The End