You are on page 1of 6

Using XLStat for K-Means Clustering

You can download XLStat from www.xlstat.com. The student version is available for
$50, and a free evaluation version (with all the required capabilities) is available for a 30
day period. XLStat provides a good tutorial on running K-means clustering using this
software at: http://www.xlstat.com/en/support/tutorials/cluster2.htm.
In order to view the outputs from XLStat properly, you need to first change the security
settings in Excel. To do that, first select the Excel Options by clicking on the windows
icon on the top left corner of Excel. Within Excel Options, select Trust Center, and
within trust center select Trust Center Settings

Within Trust Center settings, select the Macro Settings and check, Trust access to
Visual Basic Project.

Activate XLStat by clicking on the XLStat logo in the Excel Toolbar. This will activate
the XLStat menu. You can choose K-means clustering from the options for Analyzing
Data in the XLStat toolbar.

Once you choose K-means clustering you will see a dialog box as below.

In this dialog box, the observations/variable table option allows you to choose the data
to be used for cluster analysis. Choose the 62 attitudinal questions for the 250

respondents from the Psychograhic Data sheet in the Ford Ka (students) file. Please
Note that you will be choosing only the columns that have the answers to the 62
attitudinal questions and NOT the column containing the respondent number. You
are choosing only the variables that form the basis for clustering, respondent
number is not a clustering variable.
Choose the Determinant (W) as the clustering criterion. You can choose other criterion
also and the results do not change much. This is a popular and safe criterion for
clustering. Details on this criterion are beyond the scope of this class. Specify the
Number of Classes (or clusters) you want the K-means clustering algorithm to use.
You would have to repeat the cluster analysis by varying the Number of classes from 3
to 6 for this exercise.

In the Options tab, choose Cluster by rows since you are clustering respondents and
they are arranged in rows in the data. You can choose the default, Initial Partition for this
exercise. The random initial partition chooses random centroids as starting point for the
clustering algorithm. State that you would require 100 repetitions. This implies that
the cluster analysis would run for at least 100 iterations before stopping or if the stop
condition is satisfied. Choose 50 iterations and 0.001 as the convergence criteria.
This implies that the algorithm will stop if the cluster centroids did not change by more
than 0.1% in the last 50 iterations. This is sufficient for our data. Please refer to the
technical note: Cluster Analysis for Market Segmentation, for further details.

In the Missing data tab, choose the Do not accept missing data option. We do not
have missing data, so it will not matter.

In the Outputs option, choose the (a) Descriptive Statistics, (c) optimization
summary it provides the within and between class variance required for constructing
the elbow plot, and (c) Results by objectit tells the cluster each observation or
respondent belongs to (required for profiling and summarizing the results from the cluster
analysis).

Finally, Uncheck the Evolution of the criterion in the Charts tab. This is not
necessary for us.

Once you have constructed the Elbow Plot (as described in the note-Cluster Analysis
for Market Segmentation, you would require to run the cluster analysis one more time
with the optimal number of clusters you selected. You would then create a new column
in the Psychographic data sheet that contains the cluster membership for each
respondent. You can then use a pivot table to obtain the means of the 62 attitudinal
questions for each cluster. The means of the attitudinal questions can be used to
understand the composition of each cluster (i.e., name them!).

You might also like