You are on page 1of 5

Course: DD2427 - Exercise Class 1

Exercise 1: Motivation for the linear neuron


Consider the two class classication problem. The posterior class for 1 given a feature vector x is written as p(1 | x) = p(x | 1 )p(1 ) p(x | 1 ) p(1 ) + p(x | 2 ) p(2 )

Show that this posterior can be written in the form p(1 | x) = where a = ln Say each 2 i1 0 . . . 0 0 2 . . . 0 i1 Di = . . .. . . . . . . . . 2 0 0 . . . id p(x | 1 ) p(1 ) p(x | 2 ) p(2 ) 1 1 + exp(a)

p(x | i ) = N (0, Di )

with diagonal

ans p(1 ) = p(2 ), then show that p(1 | x) = 1 1 + exp((wT (x) + w0 ))

where (x) = x. x (stealing some Matlab notation).

Exercise 2: Fishers Linear Discriminant and MSE*


The least squares approach to the determination of a linear discriminant was based on the goal of making the model predictions as close as possible to a set of target values. By constrast, the Fisher creterion is derived by requiring maximum class separation in the output space in conjuction with minimum within class spread. For the two-class problem the Fisher criterion can be seen as a special case of least squares.
n Take the targets for class 1 and to be n1 where n1 is the number of patterns from class n 1 and n is the total number of patterns. For class 2 take the targets to be n2 where n2 is the number of patterns from class 2 .

The sum-of-squares error function is written as


n

J=
i=1

(wT xi + w0 ti )2

where each ti = minimized when

n n1

or

n n2

depending if xi belongs to class 1 or 2 . Show that J is


1 w SW (m2 m1 )

where mi = 1 ni x
xi

and Sw =
x1

(x m1 )(x m1 )T +
x2

(x m2 )(x m2 )T

Exercise 3: Linear Separability


Given a set of data points {xi }, we can dene the convex hull to be the set of all points x given by x=
i

i xi

where i 0 and i i = 1. Consider a set of points {yi } together with their corresponding convex hull. By denition, the two sets of points will be linearly separable if there exists a vector w and a scalar 0 such that wT xi + 0 > 0 for all xi , and T x + < 0 for all y . Show that if their convex hulls intersect, the two sets of points w i 0 i cannot be linearly separable, and conversely that if they are linearly separable, their convex hulls do not intersect.

Exercise 4: Boosting
Consider the exclusive-OR (XOR) problem. It is dened by two dimensional points (x = (x, y)) belonging to two classes. These points are 1 : (1, 1)T , (1, 1)T 2 : (1, 1)T , (1, 1)T In this question we will investigate if it is possible to build a strong classier using boosting that correctly classies these points using lines to separate the two classes. i) Plot the points with dierent symbols for each class. Are they linearly separable ?

ii) Firstly let the set of weak classiers be vertical and horizontal lines. So the vertical and horizontal lines dene weak classiers of the form hv (x) = sgn (av x + cv ) hh (x) = sgn (ah y + ch )

Work through one iteration of the boosting algorithm. What is the problem ? Can we use this set of weak classiers to solve the XOR classication problem ? iii) Next consider the set of weak classiers that are lines of slope 1 and -1. Thus h1 (x) = sgn (a1 x + a1 y + c1 ) h2 (x) = sgn (a2 x a2 y + c2 )

Can this set of weak classiers provide a solution ? Sketch one possible strong classier and use a couple rounds of the boosting algorithm to compute it. (Remember eln(x) = x when x > 0.)

Exercise 5: Boosting
Imagine you have started your ex-jobb project and you have hired some rst year students to label some training data into two classes for you. Unfortunately, the night before working for you the student spent the night partying until late early in the morning. Thus he has created a labelled dataset with lots of labelling errors, say upto 20% of the data is misclassied. What consequences will this have for your project if you are building a classier using a boosting mechanism?

Exercise 6: SVM
The exclusive-OR (XOR) is the simplest problem that cannot be solved using a linear discriminant operating on the features. This XOR problem is as follows. We have two dimensional points belonging to two classes. They are: 1 : (1, 1)T , (1, 1)T 2 : (1, 1)T , (1, 1)T a) Plot these points. Are they linearly separable ? b) Consider the transformation : R2 R3 dened by (x1 , x2 ) = (x1 , x2 , x1 x2 ) Transform the data using and plot these transformed points. Are these transformed points linearly separable ?

c) What is the separating hyper-plane in this transformed space ? This separating hyper-plane results in a non-linear discriminant function in the original space. What is this non-linear discriminant function ? Plot it.

Exercise 7: SVM*
Consider a Support Vector Machine and the following training data from two categories: 1 :(1, 1)T 2 :(1, 1)T , (1, 0)T , (0, 1)T

a) Plot these four points and construct by inspection the weight vector for the optimal hyper-plane and the optimal margin, that is compute w and b such that wT x + b = 0 is the optimal separating hyper-plane. b) What are the support vectors ? c) Construct the solution in the dual space by nding the Lagrange multipliers, i . (Assume the Lagrange multipliers associated with the feature points that are not support vectors are zero.)

Exercise 8: k means clustering


Consider the k-means algorithm applied to a large amount of one-dimensional data that comes from either of one of two classes with equal prior probability. The class conditional distribution for each class is Gaussian with true means = 1 and both have standard deviation = 1. What happens when you apply the k-means algorithm with k = 2 to this data? What can you say about the means of the two clusters found and the mean of the class-conditional distributions.

Exercise 9: EM*
We have two coins. The rst is a fair coin while the second is not necessarily fair. In summary: P (H|coin 1) = This procedure is as follows: 1 2 P (H|coin 2) =

Coin 1 is tossed. If this results in a head then coin 1 is tossed again otherwise coin 2 is tossed. a) What is the probability that the 2nd toss results in a head ? b) The above process is repeated N independent times and n2 times a head is obtained on the 2nd toss. What is the maximum likelihood estimate for ? c) Say were told that the process was repeated N times and in total M heads were obtained (this includes the rst and second toss). What two update equations can we repeatedly apply to obtain an estimate for ?

You might also like