Professional Documents
Culture Documents
Show that this posterior can be written in the form p(1 | x) = where a = ln Say each 2 i1 0 . . . 0 0 2 . . . 0 i1 Di = . . .. . . . . . . . . 2 0 0 . . . id p(x | 1 ) p(1 ) p(x | 2 ) p(2 ) 1 1 + exp(a)
p(x | i ) = N (0, Di )
with diagonal
J=
i=1
(wT xi + w0 ti )2
n n1
or
n n2
where mi = 1 ni x
xi
and Sw =
x1
(x m1 )(x m1 )T +
x2
(x m2 )(x m2 )T
i xi
where i 0 and i i = 1. Consider a set of points {yi } together with their corresponding convex hull. By denition, the two sets of points will be linearly separable if there exists a vector w and a scalar 0 such that wT xi + 0 > 0 for all xi , and T x + < 0 for all y . Show that if their convex hulls intersect, the two sets of points w i 0 i cannot be linearly separable, and conversely that if they are linearly separable, their convex hulls do not intersect.
Exercise 4: Boosting
Consider the exclusive-OR (XOR) problem. It is dened by two dimensional points (x = (x, y)) belonging to two classes. These points are 1 : (1, 1)T , (1, 1)T 2 : (1, 1)T , (1, 1)T In this question we will investigate if it is possible to build a strong classier using boosting that correctly classies these points using lines to separate the two classes. i) Plot the points with dierent symbols for each class. Are they linearly separable ?
ii) Firstly let the set of weak classiers be vertical and horizontal lines. So the vertical and horizontal lines dene weak classiers of the form hv (x) = sgn (av x + cv ) hh (x) = sgn (ah y + ch )
Work through one iteration of the boosting algorithm. What is the problem ? Can we use this set of weak classiers to solve the XOR classication problem ? iii) Next consider the set of weak classiers that are lines of slope 1 and -1. Thus h1 (x) = sgn (a1 x + a1 y + c1 ) h2 (x) = sgn (a2 x a2 y + c2 )
Can this set of weak classiers provide a solution ? Sketch one possible strong classier and use a couple rounds of the boosting algorithm to compute it. (Remember eln(x) = x when x > 0.)
Exercise 5: Boosting
Imagine you have started your ex-jobb project and you have hired some rst year students to label some training data into two classes for you. Unfortunately, the night before working for you the student spent the night partying until late early in the morning. Thus he has created a labelled dataset with lots of labelling errors, say upto 20% of the data is misclassied. What consequences will this have for your project if you are building a classier using a boosting mechanism?
Exercise 6: SVM
The exclusive-OR (XOR) is the simplest problem that cannot be solved using a linear discriminant operating on the features. This XOR problem is as follows. We have two dimensional points belonging to two classes. They are: 1 : (1, 1)T , (1, 1)T 2 : (1, 1)T , (1, 1)T a) Plot these points. Are they linearly separable ? b) Consider the transformation : R2 R3 dened by (x1 , x2 ) = (x1 , x2 , x1 x2 ) Transform the data using and plot these transformed points. Are these transformed points linearly separable ?
c) What is the separating hyper-plane in this transformed space ? This separating hyper-plane results in a non-linear discriminant function in the original space. What is this non-linear discriminant function ? Plot it.
Exercise 7: SVM*
Consider a Support Vector Machine and the following training data from two categories: 1 :(1, 1)T 2 :(1, 1)T , (1, 0)T , (0, 1)T
a) Plot these four points and construct by inspection the weight vector for the optimal hyper-plane and the optimal margin, that is compute w and b such that wT x + b = 0 is the optimal separating hyper-plane. b) What are the support vectors ? c) Construct the solution in the dual space by nding the Lagrange multipliers, i . (Assume the Lagrange multipliers associated with the feature points that are not support vectors are zero.)
Exercise 9: EM*
We have two coins. The rst is a fair coin while the second is not necessarily fair. In summary: P (H|coin 1) = This procedure is as follows: 1 2 P (H|coin 2) =
Coin 1 is tossed. If this results in a head then coin 1 is tossed again otherwise coin 2 is tossed. a) What is the probability that the 2nd toss results in a head ? b) The above process is repeated N independent times and n2 times a head is obtained on the 2nd toss. What is the maximum likelihood estimate for ? c) Say were told that the process was repeated N times and in total M heads were obtained (this includes the rst and second toss). What two update equations can we repeatedly apply to obtain an estimate for ?