Professional Documents
Culture Documents
Q1. Which of the following is NOT a goal or reason of using cluster analysis?
(a) when there is a need to regress non-linear data with linear regression
[CORRECT]
(b) when we wonder if our customers are made up of 3 different groups with
different needs
(c) when we want to remove truly outlying data values so as not to get unreliable
regression models.
Q2. What is the Euclidean distance for the two data points p1 = (3, -5, 2, 9) and p2 = (-2,
1, 6, 7)?
(a) 17
(b) 9 [CORRECT]
(c) 6
Q3. Just by inspection and without using any calculation device, what would be a
practically acceptable result of clustering the following data: { 9, 205, 232, 4, 217, 3 }
(a) 2 Clusters: { 9, 205, 232 } and { 4, 217, 3 }
(b) 2 Clusters: { 3, 4, 9 } and { 205, 217, 232 } [CORRECT]
(c) 5 Clusters: { 3, 4 }, { 9 }, { 205 }, { 217 }, { 232 }
Q4. Using K-Means clustering on the data { 9, 205, 232, 4, 217, 3 } into 2 clusters, A and
B, with initial centers { 10 } and { 200 } respectively, which cluster would each data point be
associated with after one step of clustering calculations? (Recall that for K-Means, the
distance function is based on squared-Euclidean function) (Note that the names of the
clusters are not important during intermediate calculations so long as they are distinct)
(a) 9B, 205A, 232A, 4A, 217B, 3B
(b) 9A, 205A, 232A, 4B, 217B, 3B
(c) 9A, 205B, 232B, 4A, 217B, 3A [CORRECT]
Q5. Continuing from Q4 where we used K-Means clustering on the data { 9, 205, 232, 4,
217, 3 } into 2 clusters with initial centers { 10 } and { 200 }, at the end of the first step, what
would be the new cluster centers? (Recall that for K-Means, the distance function is based
on squared-Euclidean function)
(a) A: { 5.3333 }, B: { 218 } [CORRECT]
(b) A: { 148.6667 }, B: { 74.6667 }
(c) A: { 28.6667 }, B: { 446 }
2017SEM2 Page 1
AB1202 – Statistics and Quantitative Methods In-Lecture Exercise Solutions
2017SEM2 Page 2