Professional Documents
Culture Documents
step 1, and assign a cluster to each data sample according to k-means. Youll want to allow
for both k, and the number of iterations to be variables that can be altered in your code.
Tasks
i. Implement K-means clustering
ii. Compare the clusters produced for different values of k (e.g. 2, 6, 12)
iii. Output the sum of squared distances (SSE) for each cluster, and an overall score
4. Initialization [4 Marks]
We have seen in class that the behavior of k-means can be dependent on the initialization
stage i.e. where the initial cluster centers are placed.
Tasks
i.
Include a mechanism for initializing the cluster centers that adds stability to the
resulting cluster assignments. You should justify that this works based on the output
of part 3.iii.
5. Analysis [4 Marks]
Tasks
i. Comment on how well the data is separated into distinct clusters (remember also that
you dont typically have labels when clustering)
ii. Samples in this data are sequential, but are being treated as independent
observations. How might this knowledge be included to produce a different result from
clustering?
TO HAND IN
You should hand in a .zip file named yourlastname_yourfirstname_A1.zip to the D2L dropbox.
This .zip should contain the following:
1. A python (.py) file named myKMeans.py that implements the steps outlined above.
2. A pdf file (yourlastname_yourfirstname.pdf) that includes the following:
a. A visualization of the scatter plot produced in 2.ii., 2.iii. (1 plot is fine)
b. 1 paragraph describing your solution to part 4. (If not implemented say so here)
c. 1 paragraph with your comments on 5.i.
d. 1 paragraph with your comments on 5.ii.