You are on page 1of 13

Introduction

This report is intended to study some of the consumer shopping habits. The report is based on
statistical analysis of qualitative data.

About the Data

The data used in this report is based on the research named “Retail Competition and Consumer
Choice”. The data collected between 2002 and 2004 in Portsmouth, UK. Principle Investigators are:
Ian Clarke (AIM and Lancaster University Management School), Peter Jackson (University of
Sheffield) and Alan Hallsworth (Manchester Metropolitan University). The research funded by the
Economic & Social Research Council (ESRC). The distributer of the data is UK Data Archive,
University of Essex. The data downloaded from Economic and Social Data Services website.

The data collected in four stages, producing two quantitative and two qualitative data sets. Phase I
data collected by interviewing 2,515 consumers at the main food stores (Seven stores at time of the
research). Focus of this phase was exploring characteristics of shopper group, shopping travel time
and mode and shopping behaviour. Phase II was an attitudinal survey consist of distributing 2,150 to
be done at home. Focus of this phase was exploring views on grocery shopping, choice criteria and
attitude to particular stores. 430 participants responded to the survey by resending back the filled
questionnaires.

The scope of this report is to analyze the quantitative data only (Phase I and II). The questionnaires
used and raw data sets can be found in appendix.
Data Important Characteristics

In this section of the report both the main categorical and continuous variables will be analyzed.
Beginning with the main categorical data: age and gender, and ending with income level for both data
sets. Following is the Gender-Age Group distribution of the first data sets (in store). It’s obvious that
females are more than males. The age group 45-60 was the larger group interviewed in store.

up to 24 25-34 35-44 45-60 over 60 Total Average

Male 56 70 136 239 223 724 29.45


Female 85 226 432 586 405 1734 70.55

Total 141 296 568 825 628 2458 100.00

Following is the Gender-Age Group distribution of the second data sets (in house). Females
responded more than males here also. However, respondents were mainly distributed between two age
group “45-60” and “over 60”.
up to 24 25-34 35-44 45-60 over 60 Total Average

Male 2 2 13 19 45 81 19.29
Female 3 34 72 114 116 339 80.71

Total 5 36 85 133 161 420 100.00

Another characteristic can be compared between the two sets is the household income level. In the
first set interviewee are been asked directly to describe themselves. On the other hand, living area is
been used to categorized the respondents of second data set.

Frequency Percent
Frequency low income/Paulsgrove 99 23.0
Percent low/middle/ Purbrook 93 21.6
low income 339 13.5
middle /Drayton 64 14.9
low/middle Income 894 35.6
high /Cavendish 84 19.5
middle Income 1004 40.0
middle/high/ Farlington 39 9.1
high Income 64 2.5
very rich 51 11.9
middle/high Income 130 5.2
Total 430 100.0
very rich 79 3.1
Total 2510 100.0
Store visitor income level frequency distribution In house respondent income level frequency
distribution
Most of the store visitors are middle income shoppers. The cumulative of middle and low/middle
indicate more than 75% of the costumers are falling in these two categories. The second frequency
distribution proves that the questionnaire was distributed very well, or at least the responses received
from various level of income.

For continuous variable the normality for the data is assessed. In store data set contains two
continuous variables: money spent on food and money spent on other shopping in the store. In house
data doesn’t contain continuous data. Following is the normality assessment of two variables, money
spent on food and money spent on other:

Descriptives Statistic Std. Error

Money spent on food Mean 32.79 .593

95% Confidence Interval Lower Bound 31.63


for Mean Upper Bound 33.96

5% Trimmed Mean 29.97

Median 25.00

Variance 884.371

Std. Deviation 29.738

Skewness 1.474 .049

Money spent on other Mean 2.95 .207

95% Confidence Interval Lower Bound 2.54


for Mean Upper Bound 3.35

5% Trimmed Mean 1.35

Median .00

Variance 108.252

Std. Deviation 10.404

Skewness 12.038 .049

Both variables has significantly high positive Skewness, this means scores from both variables are
clustered to the left. This means both of the variables are not normally distributed. Reporting the mean
gives a better idea about the data. Looking at the histograms below will give you a better idea.
The histograms showed the actual distribution of the two variables, it’s obviously not normal. This
also supported by an inspection of the normal probability plot. Following the plot for both of the
variables:
Looking at the histogram of “money spent on other”, you will notice that a lot of scores are equal to
zero. Therefore, we can create another variable of “total money spent”. Following the descriptive
statistics for the new variable:

Statistic Std. Error


SpentTotal Mean 35.7391 .64713
95% Confidence Interval Lower Bound 34.4701
for Mean
Upper Bound 37.0081
5% Trimmed Mean 32.5853
Median 27.0000
Variance 1053.229
Std. Deviation 32.45348
Interquartile Range 39.00
Skewness 2.144 .049

The histogram shape doesn’t look different from “money spent on food”. However, it contains less
peakedness

Last descriptive observation of the data would be the difference between male and female shopper in
terms of spending. The new variable “Total Spent” is used here. Following bar graph gives a rough
idea:
We cannot neglect the fact that females representing more than 70% of the in store interviewee.
However this give an indication those female shoppers are more likely to spend money on food more
than male shoppers.
Tests of the Data

In this section of the report the data set is to be tested to conclude some findings. Tests used are
correlation,

Test of Correlation

First test is the correlation between the total money spent and time spent on the store. The null
hypothesis is there’s no correlation (correlation coefficient is zero). The alternative hypothesis is that
the correlation coefficient is not zero. To summarize:
H0: there’s no correlation, correlation coefficient is zero.
Ha: correlation coefficient is not zero

Running the Pearson Two-Tailed Correlation test in SPSS give the following output:

Correlations

SpentTotal Time Spent


SpentTotal Pearson Correlation 1 .570**
Sig. (2-tailed) .000
N 2515 2509
Time Spent Pearson Correlation .570** 1
Sig. (2-tailed) .000
N 2509 2509
**. Correlation is significant at the 0.01 level (2-tailed).

First thing to check is the N-value which is number of cases, this indicates only 6 cases are excluded.
Secondly, we check the r-value which indicates the correlation coefficient. In this case the r-value is
not equal zero. Therefore, the null hypothesis is rejected and the alternative hypothesis is suggested to
be true. Since the r-value is positive, it’s suggested that the correlation is positive. The significant
level is .000, this means we’re sure at p < .0005. In simple English, we’re quite sure that shoppers
who stay longer in the store tend to spend more money.

Mann-Whitney U Test

Second test on the data will be Mann-Whitney U test. This is a non-parametric test used to compare
the median of two independent groups on a continuous measure. The test is used her to determine
whether female shoppers tend to spend more money in total. The null hypothesis is that both male and
female shoppers have the same median. The alternative hypothesis one of the two group tend to spend
more. To summarize:
H0: There’s no difference, the two groups has the same median.
Ha: The two group median is not the same
Running Mann-Whitney U test in SPSS gives the following output:

Test Statisticsa Ranks


Sum of
SpentTotal gender N Mean Rank Ranks
Mann-Whitney U 544312.000 SpentTotal Male 725 1113.78 807487
Female 1739 1282 2229393
Wilcoxon W 807487.000
Total 2464
Z -5.351

Asymp. Sig. (2-tailed) .000

a. Grouping Variable: Respondent gender

Obtaining the median scores for each group gives the following output:

Respondent gender N Median


Male 725 22.0000
Female 1739 30.0000

Total 2464 27.0000

The value of r can be calculated as the following: r = Z/SQRT(N) ==> r = -5.351/SQRT(2464)


==> r = -0.1078. Therefore, the Mann-Whitney U test revealed there’s a significant difference
between the two medians. The null hypothesis is rejected and the alternative hypothesis is suggested
to be true.

One-way ANOVA Test

The third test to be used is the one-way ANOVA, it’s used here to determine if there’s a difference in
“total money spent” for different age group of the sample. To summarize:
H0: There’s no difference, all the groups has the same mean. µ1 = µ2 = µ3 = µ4 = µ5 = µ6
Ha: The group means are not the same.

Following is the descriptive statistics:

95% Confidence Interval for


Mean
N µ Std. Deviation Std. Error Minimum Maximum

Lower Bound Upper Bound


up to 24 142 24.25 28.77 2.41 19.48 29.03 .00 210.00
25-34 301 37.70 31.32 1.81 34.15 41.25 .00 178.00

35-44 577 45.61 40.51 1.69 42.30 48.92 .00 435.00

45-60 835 36.76 31.59 1.09 34.62 38.91 .00 190.00

over 60 654 27.17 22.40 .88 25.45 28.89 .00 134.00

Total 2509 35.70 32.45 .65 34.43 36.97 .00 435.00

Running one-way ANOVA gives the following output:


Sum of Squares df Mean Square F Sig.
Between Groups 124963.376 4 31240.844 31.090 .000
Within Groups 2516114.935 2504 1004.838

Total 2641078.311 2508

From the above we can conclude that there’s a significant difference among the mean scores.
Therefore, the null hypothesis is rejected and the alternative hypothesis is suggested with probability
of error near to zero. Following table (multiple comparisons) shows the significant differences in full
details:

95%
Age Mean confidence
Group Age Group Difference (I-J) Std. Error Sig. Interval
Lower Bound Upper Bound
up to 24 25-34 -13.45* 3.23 .000 -22.26 -4.64
35-44 -21.36* 2.97 .000 -29.46 -13.25
45-60 -12.51* 2.88 .000 -20.37 -4.66
over 60 -2.92 2.93 .858 -10.93 5.09
25-34 up to 24 13.45* 3.23 .000 4.64 22.26
35-44 -7.91* 2.25 .004 -14.06 -1.76
45-60 .94 2.13 .992 -4.88 6.75
over 60 10.53* 2.21 .000 4.50 16.55
35-44 up to 24 21.36* 2.97 .000 13.25 29.46
25-34 7.91* 2.25 .004 1.76 14.06
45-60 8.84* 1.72 .000 4.16 13.53
over 60 18.44* 1.81 .000 13.49 23.38
45-60 up to 24 12.51* 2.88 .000 4.66 20.37
25-34 -.94 2.13 .992 -6.75 4.88
35-44 -8.84* 1.72 .000 -13.53 -4.16
over 60 9.59* 1.66 .000 5.07 14.11
over 60 up to 24 2.92 2.93 .858 -5.09 10.93
25-34 -10.53* 2.21 .000 -16.55 -4.50
35-44 -18.44* 1.81 .000 -23.38 -13.49
45-60 -9.59* 1.66 .000 -14.11 -5.07
*. The mean difference is significant at the 0.05 level.

Conclusion

The tests experimented in this report only described some of the data characteristics. Relevant
information like the size of shopping group and time travelled for shopping weren’t tested. There is a
positive correlation between the time spent in store and total money spent. Females appeared to spent
more money in shopping in comparison to males of the research sample. Finally different age groups
seem to have significantly different spending habits.

You might also like