You are on page 1of 2

AB1202 – Statistics and Quantitative Methods In-Lecture Exercise Solutions

AB1202 – STATISTICS AND QUANTITATIVE METHODS


In-Lecture Exercise (ILE) Solutions (Week 13)
Lecture 13 – Cluster Analysis

Q1. Which of the following is NOT a goal or reason of using cluster analysis?
(a) when there is a need to regress non-linear data with linear regression
[CORRECT]
(b) when we wonder if our customers are made up of 3 different groups with
different needs
(c) when we want to remove truly outlying data values so as not to get unreliable
regression models.

Q2. What is the Euclidean distance for the two data points p1 = (3, -5, 2, 9) and p2 = (-2,
1, 6, 7)?
(a) 17
(b) 9 [CORRECT]
(c) 6

Q3. Just by inspection and without using any calculation device, what would be a
practically acceptable result of clustering the following data: { 9, 205, 232, 4, 217, 3 }
(a) 2 Clusters: { 9, 205, 232 } and { 4, 217, 3 }
(b) 2 Clusters: { 3, 4, 9 } and { 205, 217, 232 } [CORRECT]
(c) 5 Clusters: { 3, 4 }, { 9 }, { 205 }, { 217 }, { 232 }

Q4. Using K-Means clustering on the data { 9, 205, 232, 4, 217, 3 } into 2 clusters, A and
B, with initial centers { 10 } and { 200 } respectively, which cluster would each data point be
associated with after one step of clustering calculations? (Recall that for K-Means, the
distance function is based on squared-Euclidean function) (Note that the names of the
clusters are not important during intermediate calculations so long as they are distinct)
(a) 9B, 205A, 232A, 4A, 217B, 3B
(b) 9A, 205A, 232A, 4B, 217B, 3B
(c) 9A, 205B, 232B, 4A, 217B, 3A [CORRECT]

Q5. Continuing from Q4 where we used K-Means clustering on the data { 9, 205, 232, 4,
217, 3 } into 2 clusters with initial centers { 10 } and { 200 }, at the end of the first step, what
would be the new cluster centers? (Recall that for K-Means, the distance function is based
on squared-Euclidean function)
(a) A: { 5.3333 }, B: { 218 } [CORRECT]
(b) A: { 148.6667 }, B: { 74.6667 }
(c) A: { 28.6667 }, B: { 446 }

Q6. Which of the following is true about agglomerative clustering?


(a) Agglomerative clustering is a form of non-hierarchical cluster analysis method
(b) Agglomerative clustering starts off assigning each data point to its own solo
cluster [CORRECT]
(c) Agglomerative clustering starts off by agglomerating all the data points into a
single cluster

2017SEM2 Page 1
AB1202 – Statistics and Quantitative Methods In-Lecture Exercise Solutions

Q7. A set of data points { 3, 6, 15, 2, 42 } are to be clustered using agglomerative


clustering method. At the first step, which two points would be merged?
(a) Points 3 and 6
(b) Points 6 and 2
(c) Points 3 and 2 [CORRECT]

Q8. Which of the following does NOT describe divisive clustering?


(a) Divisive clustering tests whether the distances are divisible by 2 before
clustering the data points [CORRECT]
(b) Divisive clustering starts off by assigning all data points to single cluster and
successively divide up the temporary clusters until each cluster has a single
data point
(c) Under divisive clustering, once a data point is assigned to a solo cluster
(having only one data point), the data point will not be re-assigned to another
solo cluster any further

Q9. Divisive clustering is being applied to a set of data points { 4, 6, 3, 1, 7, 8 }. Assume


that if there is a tie in distance values, the clustering software takes the data point with a
smaller data value. At the start of the divisive clustering, which two data points will be
starting points for division of the set of data points into two smaller clusters?
(a) Points 1 and 3
(b) Points 1 and 8 [CORRECT]
(c) Points 3 and 4

Q10. Which of the following statements is TRUE?


(a) Given a fixed set of data points, K-means, agglomerative and divisive
clustering methods always give the same clustering results
(b) K-means can also be described as a hierarchal clustering method because it
hierarchically sorts the data points into given K clusters.
(c) Agglomerative and divisive clustering produce dendrograms from which the
desired number of clusters can be cut out, whereas K-means does not
produce dendrograms [CORRECT]

2017SEM2 Page 2

You might also like