You are on page 1of 10

Midterm Examination

CS 540-3: Introduction to Artificial Intelligence


March 12, 2015

LAST NAME:

SOLUTIONS

FIRST NAME:

Problem

Score

Max Score

___________

18

___________

16

___________

15

___________

___________

___________

18

___________

19

Total

___________

100

1 of 10

Question 1. [18] State Space Search


Consider the following state space graph where the directed arcs represent the legal
successors of a node. The cost of moving to a successor node is given by the number on the
arc. The value of a heuristic function, h, if computed at a state, is shown inside each node. The
start state is S and the goal is G.
9

h=6

h=0

h=4

h=0
4

h=6

h=1
4

h=10
When a node is expanded, assume its children are put in the Frontier set in alphabetical order
so that the child closest to the front of the alphabet is removed before its other siblings (for all
uninformed searches and for ties in informed searches).
For each of the search methods below, give (i) the sequence of nodes removed from the
Frontier (for expansion or before halting at the goal), and (ii) the solution path found.
(a) [6] Uniform-Cost graph search (i.e., use an Explored set)

Nodes removed:
Solution:

S B A D C E G

S B D G

2 of 10

(b) [6] Greedy Best-First tree search (i.e., no repeated state checking)

Nodes removed:
Solution:

S A G

S G

(c) [6] A* tree search (i.e., no repeated state checking)

Nodes removed:
Solution:

S A D B D G

S B D G

3 of 10

Question 2. [16] Search


(a) [2] True or False: Depth-First search using iterative deepening can be made to return the
same solution as Breadth-First search.
True
(b) [2] True or False: In a finite search space containing no goal state, A* will always explore
all states.
True
(c) [2] True or False: The solution path found by Uniform-Cost search may change if we add
the same positive constant, c, to every arc cost.
True. For example, say there are 2 paths S A G and S
G where cost(S,A) = 1, cost(A,G) = 1, and cost(S,G) = 3.
So the optimal path is through A. If we now add 2 to each
cost, the optimal path goes directly from S to G and UCS
will now find this new path.
(d) [2] True or False: If h is a consistent heuristic, then h is also an admissible heuristic.
True. A heuristic is consistent iff for every node n and
every successor n of n generated by any action a, h(n)
c(n, a, n) + h(n). One simple proof is by induction on
the number k of nodes on the shortest path to any goal from
n. For k = 1, let n be the goal node; then h(n) c(n, a,
n). For the inductive case, assume n is on the shortest
path k steps from the goal and that h(n) is admissible by
hypothesis; then h(n) c(n, a, n) + h(n) c(n, a, n) +
h(n) = h(n) so h(n) at k + 1 steps from the goal is also
admissible.
(e) [2] True or False: A* search will always expand fewer nodes than Uniform-Cost search.
False
(f) [6] Suppose you are using a Genetic Algorithm. Two individuals (i.e., candidate solutions)
in the current generation are given by 8-digit sequences: 1 4 6 2 5 7 2 3 and 8 5 3 4 6 7 6 1.
(i) [3] What is the result of performing 1-point crossover with a cross-point between the
third and fourth digits?
1 4 6 4 6 7 6 1
and
8 5 3 2 5 7 2 3
(ii) [3] If there are n individuals in the current generation, how many crossover
operations are used to produce the next generation?
If there are n individuals in the current generation, then
there should be n individuals in every generation.
Crossover takes two individuals from the current generation
and generates two children individuals in the next
generation. Hence n/2 crossover operations are used.

Question 3. [15] Game Playing


4 of 10

Consider the following game tree in which the root corresponds to a MAX node and the values
of a static evaluation function, if applied, are given at the leaves.

(a) [4] What are the minimax values computed at each node in this game tree? Write your
answers to the LEFT of each node in the tree above.
E=3, F=8, G=7, H=1, I=5, J=8, K=10, B=3, C=1, D=8, A=8
(b) [4] Which nodes are not examined when Alpha-Beta Pruning is performed? Assume
children are visited left to right.
O, Q, I, (T, U,) Y

(c) [3] Is there a different ordering of the children of the root for which more pruning would result
by Alpha-Beta? If so, give the order. If not, say why not.
Yes, when the children are ordered (D, B, C)

or (D, C, B).

(d) [4] Now assume your opponent chooses her move uniformly at random (e.g., if there are
two moves, the time she picks the first move and the time she picks the second) when
its her turn, and you know this. You still seek to maximize your chances of winning. What
are the expected minimax values computed at each node in this case? Write your answers
to the RIGHT of each node in the tree above.
At MAX nodes compute the maximum of its childrens values, but at
MIN nodes compute the average of its childrens values. So, backed
up values are E=3, F=8, G=7, H=1, I=5, J=8, K=10, B=(3+8+7)/3=6,
C=(1+5)/2=3, D=(8+10)/2=9, A=9

Question 4. [6] Hierarchical Clustering


5 of 10

You are given the following table of distances between all pairs of five clusters:
A

1075

2013

2054

996

1075

3273

2687

2037

2013

3273

808

1307

2054

2687

808

1059

996

2037

1307

1059

(a) [3] Which pair of clusters will be merged into one cluster at the next iteration of hierarchical
agglomerative clustering using single linkage?

The closest pair is C and D with distance 808.

(b) [3] What will the new values be in the resulting table corresponding to the four new clusters?
Include the cluster names in the first row and first column; if clusters x and y were merged,
name that cluster x+y.
A

C+D

1075

2013

996

1075

2687

2037

C+D

2013

2687

1059

996

2037

1059

Question 5. [8] k-Means Clustering


6 of 10

Given a set of three points, -2, 0, and 10, we want to use k-Means Clustering with k = 2 to
cluster them into two clusters.
(a) [5] If the initial cluster centers are c1 = -4.0 and c2 = 1.0, show each successive iteration of
k-Means Clustering until no points change cluster, indicating at each iteration which points
belong to each cluster and the coordinates of the two cluster centers.

Point
-2
0
10

c1 = -4
2
4
14

Updating the centroids:


distances:
Point
-2
0
10

Point
-2
0
10

c1 = -2

c1 = -2
0
2
12

Updating the centroids:


distances:

c2 = 1
3
1
9

c2 = 5
7
3
5

c1 = -1

c1 = -1
1
1
11

and

Cluster
1
2
2
c2 = 5.

Cluster
1
1
2

and c2 = 10.

c2 = 10
12
10
0

Second iteration

Third iteration

Cluster
1
1
2

Centroids and clusters are now the same as previous iteration, so


stop. So, the final clustering has two points (-2 and 0) in one
cluster and one point (10) in the other cluster.

(b) [3] Yes or No: k-Means Clustering is guaranteed to find the same final clusters for the
above three points, no matter what the initial cluster center values are.
No. For example, if initially c1 = 0 and c2 = 50 then all three
points will be assigned to cluster 1 and no points will be in
cluster 2. Then, updating the cluster centers, they become c1 =
(2+0+10)/3 = 6 and c2 = 50 (no change because there are no
points). These two cluster centers and their associated points
do not change in the next iteration, so the final clustering has
all three points in one cluster and none in the other.

7 of 10

Question 6. [18] k-Nearest Neighbors


(a) [12] You are given a training set of five points and their 2-class classifications (+ or ):
(1.5, +), (3.2, +), (5.4, ), (6.2, ), (8.5, ).
(i) [3] What is the predicted class for a test example at point 4.0 using 1-NN (using
Euclidean distance between points)?
Closest neighbor is 3.2, so predicted class is +
(ii) [3] What is the predicted class for point 4.0 using 3-NN?
Closest three neighbors are 3.2, 5.4 and 6.2 so majority
class is
(iii) [3] What is the decision boundary associated with this training set using 1-NN?
(Hint: With 1D data the boundary is defined by a single point.)
The decision boundary is the midpoint between the two
closest points in opposite classes, which here is the
midpoint between 3.2 and 5.4, which is 4.3
(iv) [3] What is the decision boundary when using 3-NN?
The decision boundary will be determined by points 3.2 and
5.4 and the midpoint between 1.5 and 6.2. So, the point
1.5 + 2.35 = 3.85 is the decision boundary because points
to the left of it will be classified + using 3-NN and
points to the right of 3.85 will be classified using 3-NN.
(b) [3] Say we have a training set consisting of 100 positive examples and 100 negative
examples where each example is a point in a two-dimensional, real-valued feature space.
What will the classification accuracy be if we use the training set data as the testing set with
1-NN?
Since k-NN just memorizes the points and their classes in the
training set, if we use as the testing set the same examples,
then each testing set example will have its 1-NN be the point
itself in the training set. So, the classification accuracy will
be 100%.

(c) [3] k-NN can be thought of as an ensemble learning method using k 1-NN classifiers.
Random forests are another ensemble learning method. For a 2-class classification
problem, what operation is the same in both ensemble methods?
Bad question in that k 1-NN classifiers will all classify an
example the same way. What was intended was that the first 1-NN
classifier use the closest neighbors class, the second uses the
second closest neighbors class, etc. In any event, the common
operation is the way classifiers are combined by majority vote to
obtain the output class in both k-NN and random forest algorithms.

8 of 10

Question 7. [19] Decision Trees


(a) [10] Suppose you are using the Decision Tree Learning algorithm to learn a 2-class
classification variable, C, and you must decide which attribute to assign to a node in the
tree. At this node there are 100 examples; 25 are positive and 75 are negative. If attribute
A is selected, its first child will get 5 positive and 50 negative examples, and its second child
will get 20 positive and 25 negative examples.
(i) [5] Write an expression in terms of logs and fractions for computing the entropy of C,
i.e., H(C). You do not need to simplify this or give a numeric answer.

H(C) = H(25/100, 75/100) = -(25/100 * log2 25/100 + 75/100 *


log2 75/100)

(ii) [5] Write an expression in terms of logs and fractions for computing the conditional
entropy (also called Remainder) of choosing attribute A, i.e., H(C | A). You do not
need to simplify this or compute a numeric answer.

H(C|A) = 55/100 * H(5/55, 50/55) + 45/100 * H(20/45, 25/45)


= .55 * [-5/55 * log 5/55 50/55 * log 50/55]
+ .45 * [-20/45 * log 20/45 25/45 * log 25/45]

9 of 10

(b) [3] In a problem where each example has n binary attributes, to select the best attribute for
a decision tree node at depth k, where the root node is at depth 0, how many attributes are
candidates for selection at this node?
n - k

(c) [3] Say we use the following method to prune a decision tree: Iteratively remove non-leaf
nodes using a tuning set equal to the training set until no improvement is made in the
classification accuracy on the tuning set. How will the final pruned tree compare to the
original decision tree in terms of classification accuracy? Justify your answer.
Using the tuning set to be the same as the training set means
that the accuracy with the original decision tree will be 100% on
the tuning set, so no pruning will be done because no pruning can
improve the accuracy on the tuning set. So, the final pruned
tree will be the original decision tree.

(d) [3] After constructing a decision tree from a training set that contains many attributes you
find that the training set accuracy is very high but the testing set accuracy is low. Explain
what the likely cause of this situation is and what might be done to fix it.
The likely cause is overfitting the training data. One possible
solution is to prune the tree. Alternatively, use a random
forest.

10 of 10

You might also like