You are on page 1of 17

ADVANCED DATA ANALYSIS

Profiling based on Internet activities


(Cluster Analysis- Hierarchical and k-means)

Submitted to :
Prof. Shailaja Rego

Submitted By:
Deepak Joshi (I027)
Hardik Ranka (I048)

Research Objective: To map the profile of individuals based on internet use activities
Data Source: Textbook named SPSS 17.0 for Researchers by Dr. S.L. Gupta

Cluster Analysis
ABOUT THE DATA:
We have used the data of 31 respondents to map the profile based on internet use activites. The
respondents answered 16 questions on a rating scale 1-4.
Rating scale: Never-1, Occasionally-2, Considerably-3, Almost always-4, Always-5
Variables used:
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.

Collecting product/service information and specification


Collecting information of current vendor
Searching and collecting information of new vendor
Collecting competitive and other information for purchase
Cost/price comparison
Email
Web conferencing
Electronic Data Interchange (EDI)
Discussion Groups
Just in time inventory planning
Online Negotiation
Online Bidding
Online Payment
Online Ordering
Online Status Checking
Online product/Service support

Step 1:
Hierarchical Clustering: Determining the number of clusters

Case Processing Summarya,b


Cases
Valid
N

Missing

Percent
31

Total

Percent

100.0

N
.0

Percent
31

100.0

a. Squared Euclidean Distance used


b. Ward Linkage

From this table, we find that all the 31 cases are valid.
Agglomeration Schedule
Agglomeration Schedule
Cluster Combined
Stage

Cluster 1

Stage Cluster First Appears

Cluster 2

Coefficients

Cluster 1

Cluster 2

Next Stage

19

20

1.000

19

2.000

28

24

26

3.500

20

22

25

5.000

10

17

6.500

21

8.000

10

22

31

10.500

22

27

13.000

23

14

16

15.500

21

10

18.000

17

11

29

30

21.000

12

12

28

29

24.000

11

18

13

15

23

27.000

20

14

21

30.500

22

15

12

18

34.500

22

16

13

38.500

24

17

42.750

10

19

18

11

28

47.250

12

26

19

52.200

17

24

20

15

24

57.450

13

23

21

10

14

63.450

25

22

12

70.700

14

15

27

23

15

22

79.075

20

26

24

87.589

16

19

30

25

10

96.389

21

27

26

11

15

106.681

18

23

29

27

117.798

25

22

28

28

135.798

27

29

29

11

166.756

28

26

30

30

220.452

29

24

From the agglomeration schedule, we find that the there is a sudden high jump in co-efficient at
stage 28 from 117.798 to 135.798.
Hence,
No. of clusters = Total sample size 28 = 31 28 = 3

Dendrogram using Ward Method


Rescaled Distance Cluster Combine
C A S E
Label Num

0
5
10
15
20
25
+---------+---------+---------+---------+---------+

19
20
8
9
21
12
18
10
17
14
16
1
29
30
28
11
22
25
31
27
24
26
15
23
2
13
6
7
4
5
3

-+
-+---------------+
-+
|
---+-+
+-----------+
---+ +---+
|
|
---+-+
|
|
|
---+
+-------+
|
-+---+
|
|
-+
+-+ |
|
-+---+ +-+
+-------------------+
-+
|
|
|
-------+
|
|
-+
|
|
-+-+
|
|
-+ +-----+
|
|
---+
|
|
|
-+
+-------------------+
|
-+
|
|
-+-----+ |
|
-+
+-+
|
-+---+ |
|
-+
+-+
|
-+---+
|
-+
|
---+---+
|
---+
|
|
-+
+-----------------------------------------+
-+-+
|
-+ |
|
---+---+
---+

The above dendogram clearly shows that the longest horizontal lines are for 3 cluster solution,
shown by thick dotted line(the dotted line intersects three horizontal lines). It implies that
19,20,8,9,21,12,18,10,17,14,16,1, named as cluster 1;the cluster containing
29,30,28,11,22,25,31,27,24,26,15,23, named as cluster 2; and the cluster containing 2,13,6,7,4,5,3
named as cluster3. The cluster membership also shows similar results.

Collecting
1
Product/servi 2
ce
3
information
Total
Collecting
1
information
2
of current
3
vendor
Total
Searching
1
and
2
collecting
3
information
Total
of new
vendor
Collecting
1
competitive
2
and other
3
informationf
Total
or purchase
Cost/price
1
comparison
2
3
Total
Email
1
2
3
Total
Web
1
conferencing 2
3
Total
Electronic
1
Data
2
Interchange
3
Total
Discussion
1
Groups
2
3
Total
Just-in-time
1

Mean

Descriptives
Std.
Std.
Deviati Error
on

12
7
12
31
12
7
12
31
12
7
12
31

4.00
4.14
4.00
4.03
3.00
4.43
2.67
3.19
4.42
4.57
4.67
4.55

.000
.378
.426
.315
.603
.535
.778
.946
.515
.535
.492
.506

12
7
12
31

4.08
4.29
4.00
4.10

12
7
12
31
12
7
12
31
12
7
12
31
12
7
12
31
12
7
12
31
12

4.00
3.71
3.33
3.68
4.58
4.86
4.25
4.52
3.42
3.29
2.92
3.19
3.00
4.00
2.58
3.06
2.83
3.00
1.25
2.26
3.42

Minimu
m

Maximu
m

.000
.143
.123
.056
.174
.202
.225
.170
.149
.202
.142
.091

95%
Confidence
Interval for
Mean
Lower Upper
Bound Bound
4.00
4.00
3.79
4.49
3.73
4.27
3.92
4.15
2.62
3.38
3.93
4.92
2.17
3.16
2.85
3.54
4.09
4.74
4.08
5.07
4.35
4.98
4.36
4.73

4
4
3
3
2
4
2
2
4
4
4
4

4
5
5
5
4
5
4
5
5
5
5
5

.669
.756
.000
.539

.193
.286
.000
.097

3.66
3.59
4.00
3.90

4.51
4.98
4.00
4.29

3
3
4
3

5
5
4
5

.000
.756
.651
.599
.515
.378
.452
.508
.669
.488
.515
.601
.739
.000
.793
.854
.577
.000
.452
.930
.996

.000
.286
.188
.108
.149
.143
.131
.091
.193
.184
.149
.108
.213
.000
.229
.153
.167
.000
.131
.167
.288

4.00
3.02
2.92
3.46
4.26
4.51
3.96
4.33
2.99
2.83
2.59
2.97
2.53
4.00
2.08
2.75
2.47
3.00
.96
1.92
2.78

4.00
4.41
3.75
3.90
4.91
5.21
4.54
4.70
3.84
3.74
3.24
3.41
3.47
4.00
3.09
3.38
3.20
3.00
1.54
2.60
4.05

4
3
2
2
4
4
4
4
3
3
2
2
2
4
2
2
2
3
1
1
2

4
5
4
5
5
5
5
5
5
4
4
5
4
4
4
4
4
3
2
4
5

Inventory
planning

2
3
Total
Online
1
negotiation
2
3
Total
Online
1
bidding
2
3
Total
Online
1
payment
2
3
Total
Online
1
ordering
2
3
Total
Online status 1
checking
2
3
Total
Online
1
product/servi 2
ce support
3
Total

7
12
31
12
7
12
31
12
7
12
31
12
7
12
31
12
7
12
31
12
7
12
31
12
7
12
31

3.86
2.75
3.26
4.25
2.71
4.00
3.81
1.83
2.86
1.50
1.94
4.25
4.43
3.50
4.00
4.08
4.14
3.92
4.03
4.17
4.14
3.75
4.00
4.17
4.57
3.83
4.13

.378
.452
.815
.754
.488
.000
.792
.835
.900
.522
.892
.622
.535
.522
.683
.289
.378
.289
.315
.577
.378
.622
.577
.389
.535
.389
.499

.143
.131
.146
.218
.184
.000
.142
.241
.340
.151
.160
.179
.202
.151
.123
.083
.143
.083
.056
.167
.143
.179
.104
.112
.202
.112
.090

3.51
2.46
2.96
3.77
2.26
4.00
3.52
1.30
2.03
1.17
1.61
3.86
3.93
3.17
3.75
3.90
3.79
3.73
3.92
3.80
3.79
3.36
3.79
3.92
4.08
3.59
3.95

4.21
3.04
3.56
4.73
3.17
4.00
4.10
2.36
3.69
1.83
2.26
4.64
4.92
3.83
4.25
4.27
4.49
4.10
4.15
4.53
4.49
4.14
4.21
4.41
5.07
4.08
4.31

3
2
2
3
2
4
2
1
2
1
1
3
4
3
3
4
4
3
3
3
4
3
3
4
4
3
3

4
3
5
5
3
4
5
4
4
2
4
5
5
4
5
5
5
4
5
5
5
5
5
5
5
4
5

ANOVA
Sum of Squares
Collecting Product/service
information

Between Groups

df

Mean Square

.111

.055

Within Groups

2.857

28

.102

Total

2.968

30

Collecting information of

Between Groups

14.458

7.229

current vendor

Within Groups

12.381

28

.442

Total

26.839

30

.380

.190
.261

Searching and collecting

Between Groups

information of new vendor

Within Groups

7.298

28

Total

7.677

30

.364

Collecting competitive and

Between Groups

.182

Sig.
.542

.588

16.348

.000

.729

.491

.611

.550

other informationfor purchase Within Groups

Cost/price comparison

8.345

28

Total

8.710

30

Between Groups

2.679

1.339

Within Groups

8.095

28

.289

10.774

30

Between Groups

1.718

.859

Within Groups

6.024

28

.215

Total

7.742

30

Between Groups

1.577

.788

Within Groups

9.262

28

.331

10.839

30

8.954

4.477

Within Groups

12.917

28

.461

Total

21.871

30

Between Groups

20.019

10.009

5.917

28

.211

25.935

30

5.912

2.956
.501

Total
Email

Web conferencing

Total
Electronic Data Interchange

Discussion Groups

Between Groups

Within Groups
Total

.298

Just-in-time Inventory

Between Groups

planning

Within Groups

14.024

28

Total

19.935

30

Between Groups

11.160

5.580

7.679

28

.274

18.839

30

8.347

4.174

Within Groups

15.524

28

.554

Total

23.871

30

Between Groups

5.036

2.518

Within Groups

8.964

28

.320

14.000

30

.277

.139

Within Groups

2.690

28

.096

Total

2.968

30

Between Groups

1.226

.613

Within Groups

8.774

28

.313

Online negotiation

Within Groups
Total
Online bidding

Online payment

Between Groups

Total
Online ordering

Online status checking

Between Groups

4.633

.018

3.993

.030

2.383

.111

9.705

.001

47.368

.000

5.902

.007

20.348

.000

7.528

.002

7.865

.002

1.443

.253

1.957

.160

Total

10.000

30

Online product/service

Between Groups

2.436

1.218

support

Within Groups

5.048

28

.180

Total

7.484

30

6.757

The above ANOVA table tests the difference between the means for the different clusters. The null
hypothesis states that there is no difference between the clusters for given variable. The variables in
which the significance level is greater than 5% do not significantly vary for different clusters.

The six variables those are non-significant


1.
2.
3.
4.
5.
6.

Collecting Product/service information


Searching and collecting information of new vendor
Collecting competitive and other information for purchase
Web conferencing
Online ordering
Online status checking

We again perform the hierarchical cluster analysis excluding these six variables.

.004

Dendrogram using Ward Method


Rescaled Distance Cluster Combine
C A S E
Label Num

0
5
10
15
20
25
+---------+---------+---------+---------+---------+

24
26
23
27
9
15
22
25
31
29
30
28
11
8
20
19
14
16
10
18
17
12
1
7
13
6
4
21
3
5
2

-+
-+
-+---+
-+
+-+
-+---+ |
-+
+-+
-+
| |
-+-----+ |
-+
+-------------------+
-+
|
|
-+-+
|
|
-+ +-----+
|
---+
+-------------------+
-+
|
|
-+-------------+
|
|
-+
|
|
|
-+---+
|
|
|
-+
|
+-------------+
|
-+
+-+
|
|
-+
| |
|
|
-+---+ +-------+
|
-+
|
|
-------+
|
-+
|
-+
|
-+-+
|
-+ +---+
|
---+
+-----------------------------------------+
-+-+
|
-+ +---+
---+

Again, 3 clusters are formed as shown.


ANOVA
Sum of Squares
Collecting information of
current vendor

Cost/price comparison

Online negotiation

df

Mean Square

Between Groups

15.294

7.647

Within Groups

11.544

28

.412

Total

26.839

30

Between Groups

2.197

1.099

Within Groups

8.577

28

.306

Total

10.774

30

Between Groups

13.916

6.958

Sig.

18.548

.000

3.587

.041

39.573

.000

Within Groups

4.923

28

18.839

30

5.865

2.933

Within Groups

18.006

28

.643

Total

23.871

30

Between Groups

6.494

3.247

Within Groups

7.506

28

.268

14.000

30

Total
Online bidding

Between Groups

Online payment

Total

.176

Online product/service

Between Groups

1.661

.830

support

Within Groups

5.823

28

.208

Total

7.484

30

Between Groups

1.073

.536

Within Groups

6.669

28

.238

Total

7.742

30

Between Groups

9.894

4.947

Within Groups

11.977

28

.428

Total

21.871

30

Between Groups

20.291

10.146

5.644

28

.202

25.935

30

5.628

2.814
.511

Email

Electronic Data Interchange

Discussion Groups

Within Groups
Total
Just-in-time Inventory

Between Groups

planning

Within Groups

14.308

28

Total

19.935

30

4.560

.019

12.113

.000

3.993

.030

2.252

.124

11.565

.000

50.331

.000

5.507

.010

Now, all the variables are significant(there is difference between the clusters) except for e-mail.
Hence, 3 cluster solution is a good solution.
Descriptives
95% Confidence
Interval for Mean
N
Collecting
information of
current vendor

Cost/price
comparison

Std.
Deviation
.568

Std.
Error
.180

Lower
Bound
2.49

Upper
Bound
3.31

Minimum
2

Maximum
4

10

Mean
2.90

4.38

.518

.183

3.94

4.81

13

2.69

.751

.208

2.24

3.15

Total

31

3.19

.946

.170

2.85

3.54

10

4.00

.000

.000

4.00

4.00

3.75

.707

.250

3.16

4.34

Online
negotiation

Online bidding

Online
payment

Online
product/service
support

Email

Electronic Data
Interchange

Discussion
Groups

Just-in-time
Inventory
planning

13

3.38

.650

.180

2.99

3.78

Total

31

3.68

.599

.108

3.46

3.90

10

4.50

.527

.167

4.12

4.88

2.75

.463

.164

2.36

3.14

13

3.92

.277

.077

3.76

4.09

Total

31

3.81

.792

.142

3.52

4.10

10

1.90

.876

.277

1.27

2.53

2.63

1.061

.375

1.74

3.51

13

1.54

.519

.144

1.22

1.85

Total

31

1.94

.892

.160

1.61

2.26

10

4.40

.516

.163

4.03

4.77

4.38

.518

.183

3.94

4.81

13

3.46

.519

.144

3.15

3.78

Total

31

4.00

.683

.123

3.75

4.25

10

4.10

.316

.100

3.87

4.33

4.50

.535

.189

4.05

4.95

13

3.92

.494

.137

3.62

4.22

Total

31

4.13

.499

.090

3.95

4.31

10

4.60

.516

.163

4.23

4.97

4.75

.463

.164

4.36

5.14

13

4.31

.480

.133

4.02

4.60

Total

31

4.52

.508

.091

4.33

4.70

10

2.90

.738

.233

2.37

3.43

4.00

.000

.000

4.00

4.00

13

2.62

.768

.213

2.15

3.08

Total

31

3.06

.854

.153

2.75

3.38

10

3.00

.471

.149

2.66

3.34

2.88

.354

.125

2.58

3.17

13

1.31

.480

.133

1.02

1.60

Total

31

2.26

.930

.167

1.92

2.60

10

3.50

1.080

.342

2.73

4.27

3.75

.463

.164

3.36

4.14

13

2.77

.439

.122

2.50

3.03

Total

31

3.26

.815

.146

2.96

3.56

The final three clusters are as shown below.

Step 2:
K-Means Approach: To find the cluster membership of each case
The K-means clustering method was used using the reference point as 3 clusters to obtain stable
clusters.

Initial Cluster Centers


Cluster
1
Collecting Product/service

information
Collecting information of
current vendor
Searching and collecting
information of new vendor
Collecting competitive and
other information for
purchase

Cost/price comparison

Email

Web conferencing

Electronic Data Interchange

Discussion Groups

Just-in-time Inventory

Online negotiation

Online bidding

Online payment

Online ordering

Online status checking

Online product/service

planning

support

Iteration Historya
Change in Cluster Centers
Iteration

2.733

2.421

1.974

.271

.000

.194

.000

.000

.000

a. Convergence achieved due to no or small


change in cluster centers. The maximum
absolute coordinate change for any center is
.000. The current iteration is 3. The minimum
distance between initial centers is 4.359.

Final Cluster Centers


Cluster
1
Collecting Product/service

information
Collecting information of
current vendor

Searching and collecting

Cost/price comparison

Email

Web conferencing

Electronic Data Interchange

Discussion Groups

Just-in-time Inventory

Online negotiation

Online bidding

Online payment

Online ordering

Online status checking

Online product/service

information of new vendor


Collecting competitive and
other informationfor purchase

planning

support

ANOVA
Cluster
Mean Square
Collecting Product/service

Error
df

Mean Square

df

Sig.

.046

.103

28

.452

.641

7.609

.415

28

18.333

.000

.120

.266

28

.454

.640

.141

.301

28

.467

.632

Cost/price comparison

.443

.353

28

1.253

.301

Email

.403

.248

28

1.626

.215

Web conferencing

1.812

.258

28

7.034

.003

Electronic Data Interchange

6.443

.321

28

20.082

.000

Discussion Groups

7.336

.402

28

18.235

.000

information
Collecting information of
current vendor
Searching and collecting
information of new vendor
Collecting competitive and
other informationfor purchase

Just-in-time Inventory

4.833

.367

28

13.176

.000

Online negotiation

6.205

.230

28

27.027

.000

Online bidding

3.839

.578

28

6.639

.004

Online payment

2.348

.332

28

7.067

.003

Online ordering

.138

.096

28

1.431

.256

Online status checking

.606

.314

28

1.931

.164

Online product/service

.742

.214

28

3.462

.045

planning

support
The F tests should be used only for descriptive purposes because the clusters have been chosen to maximize the
differences among cases in different clusters. The observed significance levels are not corrected for this and thus
cannot be interpreted as tests of the hypothesis that the cluster means are equal.

The Anova indicates that the clusters are different only for different activities like collecting
information of current vendor, Web conferencing, Electronic Data Interchange, Discussion groups
etc. as the significance is less than 0.05 only for these variables.
Collecting information of Current Vendor
Web Conferencing
Electronic Data Interchange
Discussion Groups
Just in time inventory planning
Online negotiation
Online bidding
Online payment
Online product/service support

Number of Cases in each


Cluster
Cluster

Valid
Missing

9.000

8.000

14.000
31.000
.000

0.000
0.003
0.000
0.000
0.000
0.000
0.004
0.003
0.045

You might also like