You are on page 1of 57

Table of Contents

A. Multivariate Data Analysis: ............................................................................................................... 4


Data: ............................................................................................................................................................. 4
Types of data ........................................................................................................................................... 4
1. Time series data: ......................................................................................................................... 4
2. Cross-sectional data: ................................................................................................................... 4
3. Panel/longitudinal data: ............................................................................................................. 4
Measurement scales: ................................................................................................................................... 4
1) Nonmetric (qualitative): ................................................................................................................. 4
2) Metric (quantitative): ..................................................................................................................... 5
 Non metric measurement scale has two types ordinal and nominal........................................... 5
Ordinal scale: ...................................................................................................................................... 5
Nominal scale: ..................................................................................................................................... 5
 Metric measurement scale has two further types: ....................................................................... 5
Interval scale: ...................................................................................................................................... 5
Ratio scale: ........................................................................................................................................... 5
Measurement error and multivariate measurement: .............................................................................. 6
Validity: ................................................................................................................................................... 6
Reliability: ............................................................................................................................................... 6
Types of statistical error and power: ........................................................................................................ 6
Types of multivariate techniques: ............................................................................................................. 7
Dependence techniques: ......................................................................................................................... 7
Interdependence techniques: ................................................................................................................. 8
INTERPRETATION .................................................................................................................................. 9
CASE SCREENING: .................................................................................................................................. 9
Data screening by case and variable: .................................................................................................... 9
Output: ............................................................................................................................................... 10
Output: ............................................................................................................................................... 11
B. EXPLORATORY FACTOR ANALYSIS: ..................................................................................... 11
Exploratory Factor Analysis: ............................................................................................................... 11
Assumptions for EFA: .......................................................................................................................... 11
 Principle component analysis: ..................................................................................................... 11

pg. 1
 Factor loadings: ............................................................................................................................. 12
How many factors should be retain? ................................................................................................... 12
 Factor rotation: ............................................................................................................................. 12
 Orthogonal and oblique rotation:................................................................................................ 13
 Bartlett’s test: ................................................................................................................................ 13
 MANOVA: ..................................................................................................................................... 13
 ANOVA:......................................................................................................................................... 13
INTERPRETATION ................................................................................................................................ 13
Output: ............................................................................................................................................... 14
C. Structural Equation Modeling:........................................................................................................ 16
1. Measurement model: .................................................................................................................... 17
CFA assumptions: ............................................................................................................................. 17
2. Structural modeling: ..................................................................................................................... 17
How does SEM works? ............................................................................................................................. 18
Step 1: Model Specification. ................................................................................................................. 18
Step 2: Data characteristics.................................................................................................................. 18
Step 3: Model estimation. ..................................................................................................................... 18
Step 4: Model evaluation. ..................................................................................................................... 18
Step 5: Model modification, alternative models and equivalent models. ......................................... 18
D. Confirmatory Factor Analysis. ........................................................................................................ 19
Parameters: ........................................................................................................................................... 19
Interpretation: ........................................................................................................................................... 19
• Model fit:........................................................................................................................................ 22
• Indices: ........................................................................................................................................... 23
• Reliabilities: ................................................................................................................................... 24
E. Feel of data: ....................................................................................................................................... 25
Step 1: Multiple graphs. ....................................................................................................................... 25
Step 2: Data’s Descriptive Analysis. .................................................................................................... 25
Step 3: Exploration of assumption. ..................................................................................................... 26
• Normality:.................................................................................................................................. 26
Info Q-Q plot: .................................................................................................................................... 29
Aty Q-Q Plot:..................................................................................................................................... 30
Play Q-Q Plot: ................................................................................................................................... 31

pg. 2
Quality Q-Q Plot: .............................................................................................................................. 32
Joy Q-Q Plot: ..................................................................................................................................... 33
Useful Q-Q Plot: ................................................................................................................................ 34
Case no. Q-Q Plot: ............................................................................................................................ 35
• Homogeneity and variance:...................................................................................................... 36
Output ................................................................................................................................................. 37
• Independence: ........................................................................................................................... 38
Step 4: Correlation Analysis. ............................................................................................................... 38
 Strength. ...................................................................................................................................... 38
 Direction. .................................................................................................................................... 38
 Types of correlation: ......................................................................................................................... 39
INTERPRETATION ................................................................................................................................ 39
Bivariate correlation:............................................................................................................................ 39
Partial Correlation: .............................................................................................................................. 40
F. Difference of means analysis/ Variance analysis: ........................................................................... 41
Parametric Test Example: ....................................................................................................................... 42
Non-Parametric Test Example: ............................................................................................................... 43
G. Multivariate regression: ............................................................................................................... 45
 Regression:..................................................................................................................................... 45
 Regression analysis: ...................................................................................................................... 45
 Regression coefficient: .................................................................................................................. 45
 Residuals: ....................................................................................................................................... 45
 Assessing the goodness of fit: sum of squares: R and R2: .......................................................... 45
 Conditions to imply regression analysis: .................................................................................... 45
 The assumptions for better results of regression: ...................................................................... 46
• Regression Plots: ........................................................................................................................... 46
• Multi collinearity tests: ................................................................................................................. 47
 Simple regression on SPSS: .......................................................................................................... 47
Output ................................................................................................................................................ 51

pg. 3
A. Multivariate Data Analysis:
Variate means combination of some variables in a linear relationship. Data is the collection of
variables. So multivariate analysis involves observation and analysis of more than one statistical
outcome variable at a time.

Data:
Data is basically facts and figures which are used to perform the further analysis.

Types of data
There are three types of data:

1. Time series data:


The data which is collected in specific time frame i.e monthly, yearly.in time series data
previous values generate new values. Mostly used by secondary type of data. For
example to calculate GDP, GNP of any country, to find interest rate so on.

2. Cross-sectional data:
We collect data in single time from multiple areas or cross sections. For example ratio
analysis of all countries, find motivation or satisfaction level of employees etc.

3. Panel/longitudinal data:
We collect data in multiple time series from multiple cross sections. For example we
want to calculate GDP or GNP of 5 countries from the year 2001 to 2005, so we take data
in multiple time from multiple areas.

Measurement scales:
Identification and measurement of variation in variables is necessary to measure. Researchers
cannot find variation unless the variable can be measured. Data can be classified into two
categories

1) Nonmetric (qualitative): Includes the information that is ranked which is called


ordinal and information that has no linear pattern to it which is called nominal or
categorical.

pg. 4
2) Metric (quantitative): This type of data includes information that is ranked (the researcher
know the order but not the difference between the values) and information that has no linear
pattern to it (male or female).

 Non metric measurement scale has two types ordinal and nominal.
Ordinal scale: is used to label the subjects or objects to identify them easily e.g value 1 is assign
to male and value 2 is assign to female.

Nominal scale: it provide no measure of the actual amount or magnitude in absolute terms, only
the order of the value e.g likert scale which consist of different levels like strongly
agree=1,agree=2,normal=3,disagree=4,strongly disagree=5.

 Metric measurement scale has two further types:


Interval scale: the difference between two variables on a scale is an actual and equal distance.
For example difference between 68 degrees F and 58 degrees F is the exact same as 101 degrees
F and 91 degrees F.

Ratio scale: allows any researcher to compare the intervals or differences and possesses a zero
point or character of origin.

For example:

Please select which age bracket do you fall in?

Below 20 years

21-30 years

31-40 years

41 years or above.

Understanding the measurement of scales is important because of two reasons:

1) Nonmetric data are not incorrectly used as metric data and vice versa.
2) Identify which multivariate technique will be most applicable to the data.

pg. 5
Measurement error and multivariate measurement:
Two important characteristics of a measure are validity and reliability.

Validity: is the degree to accurately measure the data which it is supposed to. Validity has
further three types:

1. Content Validity and Face Validity.


2. Criterion-Oriented or Predictive Validity.
3. Concurrent Validity (Test with already established tool)

Reliability: is a degree to which the observed variables measures the true value and error free.
It has further four types:

1. Test-retest Reliability.
2. Equivalent-Forms or Alternate-Forms Reliability.
3. Split-Half Reliability.
4. Rationale Equivalence Reliability.

Types of statistical error and power:


• Type I error, or : is the probability of rejecting the null hypothesis when it is true.

• Type II error, or  : is the probability of failing to reject the null hypothesis when it is
false.

• Power, or 1- : is the probability of rejecting the null hypothesis when it is false.

H0 true H0 false

Fail to 1- 
Reject H0

pg. 6
Type II
error

Reject H0  1-

Type I error Power

Types of multivariate techniques:


Dependence techniques: a variable or set of variables is identified as the dependent variable
to be predicted or explained by other variables known as independent variables.

 Multiple Regression: a single metric dependent variable is predicted by several metric


independent variables.
 Multiple Discriminant Analysis: single, non-metric (categorical) dependent variable is
predicted by several metric independent variables.(men or female,purchaser or non
purchaser etc)

pg. 7
 Logit/Logistic Regression: A single nonmetric dependent variable is predicted by
several metric independent variables. This technique is similar to discriminant analysis,
but relies on calculations more like regression.
 Multivariate Analysis of Variance (MANOVA) and Covariance: Several metric
dependent variables are predicted by a set of nonmetric (categorical) independent
variables.
 Conjoint Analysis: is used to understand respondents’ preferences for products and
services In doing this, it determines the importance of both attributes and levels of
attributes based on a smaller subset of combinations of attributes and levels.
 Canonical Correlation: Several metric dependent variables are predicted by several
metric independent variables.
 Structural Equations Modeling (SEM):

Interdependence techniques: involve the simultaneous analysis of all variables in the set,
without distinction between dependent variables and independent variables.

 Principal Components and Common Factor Analysis: analyzes the structure of the
interrelationships among a large number of variables to determine a set of common
underlying dimensions (factors).
 Cluster Analysis: groups objects (respondents, products, firms, variables, etc.) so that
each object is similar to the other objects in the cluster and different from objects in all
the other clusters.
 Multidimensional Scaling (perceptual mapping): identifies “unrecognized”
dimensions.
 Correspondence Analysis: uses non-metric data and evaluates either linear or non-linear
relationships in an effort to develop a perceptual map representing the association
between objects (firms, products, etc.) and a set of descriptive characteristics of the
objects.

There is a structured approach to multivariate model building.

Stage 1: Define the Research Problem, Objectives, and Multivariate Technique(s) to be used

Stage 2: Develop the Analysis Plan

pg. 8
Stage 3: Evaluate the Assumptions Underlying the

Multivariate Technique(s)

Stage 4: Estimate the Multivariate Model and Assess

Overall Model Fit

Stage 5: Interpret the Variate(s)

Stage 6: Validate the Multivariate Model.

INTERPRETATION

CASE SCREENING:
We did screening of data so that all the missing variables in the data are removed, find the
unengaged responses and the outliers in the data. We find if any cases or rows or people didn’t
respond we remove them from our data.

Data screening by case and variable:


We go to SPSS and open the data view. We copy all the data and then paste it to excel. Then we
again go to spss and copy the name column from the variable view and paste it as a transpose in
excel. Then we make a new column for missing variables. We write a formula to find the
missing variable in below cell (=countblank) and from left copy all the row and then press enter.
We will come up with the missing variable involve in that specific respondent’s response.
Double tap the MV cell and all the results will appear. After this select the MV column and sort
it from Z to A.

After this make new column for unengaged variables. A person who respond with same value for
every single question. Make a new column for unengaged variables. It is find by the formula
(=stdev,P) and then copy all the data from the left which include quantitative data response and
press enter. You have now the value of unengaged variable. To find unengaged variables for all
the responses move the cursor to the first unengaged response the mouse shape will convert to
plus sign then do double tap and all the results will appear automatically. After this we sort data
from A to Z.

pg. 9
Open SPSS then follow the following steps:

Analyze Multiple Imputation Analyze patterns

Then a window appears. Select all the variables except ID and Expact. In output tick all the
options and type 0 in the minimum percentage of missing variable and then press OK. We can
impute the data because not much missing variables found in the output.

Analyze Missing Value Analysis.

A window will appear select all those variables which calculated with likert scale or higher scale
e.g independent and dependent variables etc in quantitative column and those who measured
with ordinal and nominal scale will go to the categorical column. Then on the right side select
EM. A window will appear select student’s t and write degree of freedom 5. Save the completed
data and write the file name without space (dataaftermissingvariables) then continue.

Output:
At the bottom of univariate statistic table the value of sig should be less than 0.05. My value is
0.145 so data has no pattern. Our null hypothesis said that data is missing completely of random.
We want it to accept when significant is higher. So, p=0.145 null will be accepted and alternative
will be rejected.

EM estimated statistics:

Little's MCAR test: Chi-Square = 274.773, DF = 251, Sig. = .145

Outliers:

An outlier is an observation point that is distant from other observations. Outliers can cause your
model to be biased because they affect the values of the estimated regression coefficients.

Analyze Descriptive statistics Explore

A new window will appear select all shift to dependent list. Then go to statistics and only select
outliers then continue.

pg. 10
Output:
We see that in stem and leaf plot those values which have steric * sign defines that these
respondent age is greater than the other respondents. If that variable matters according to our
theory then we delete that record otherwise we will continue. Although we found two cases ID
no. 378 and 380 which have higher or more in age then all the other case but our study is not
sensitive so we will continue with it. Exclusion of these cases will not affect our data. So we
ignore.

B. EXPLORATORY FACTOR ANALYSIS:


Factor analysis is a process in which we measure things that cannot be directly measured called
latent variables. For example burnout because it has many facets. So we take other factors like
motivation, stress level and find that whether they really reflect the single variable.

Exploratory Factor Analysis:


It provide information about number of factors required to represent the data. All measured
variables are related to every latent variables. It is used to reduce data in smaller set and explore
theoretical structure of the phenomena. It is used to find relationship between variables and
correspondents.

Assumptions for EFA:


1. Metric variables and dummy variables will be used.
2. N>200
3. Homogeneous sample is required.
4. Multivariate normality is not required.
5. Correlation at least 0.3 between research variables.
6. No outliers in data should be present.
7. Ordinal scale will be used.

 Principle component analysis:


Principle component analysis (PCA) technique is use for identifying groups or cluster of
variables. Three uses of this technique are:

1) Understand structure of a set of variables.

pg. 11
2) Make questionnaire that measure the underlying variables.
3) To reduce data set and retaining as much of original information as possible.

Factor analysis achieve parsimony by explaining the maximum amount of common variance in a
correlation matrix using the smallest number of explanatory construct.

We reduce the data by finding or looking for variables that correlate highly with a group of other
variables but don’t correlate with outside of that group. This process is called cluster.

 Factor loadings:
Factor loadings are part of the outcome from factor analysis, which serves as a data reduction
method designed to explain the correlations between observed variables using a smaller number
of factors. Two types of coefficients:

1) Correlation coefficient: find statistical relationship between two variables.


2) Regression coefficient: a unit change in dependent variable corresponding to unit change
in independent variable.

How many factors should be retain?


 Cumulative percentage of variance and eigenvalue

Our factors have to show 50% variance and eigenvalue must be greater than 1.

 Scree plot:

In multivariate statistics, a scree plot is a line plot of the eigenvalues of factors or principal
components in an analysis. The scree plot is used to determine the number of factors to retain
in an exploratory factor analysis or principal components to keep in a principal component
analysis.

 Factor rotation:
Once factors have been extracted we will calculate degree of variable load on to these factors.
Generally, we find most variables have high loading on the most important factors and small
loading on all other factors. This makes interpretation difficult. So, factor rotation technique will
be used.

pg. 12
 Orthogonal and oblique rotation:
Orthogonal and oblique rotation depends upon theoretical reason to suppose that the factors
should be related or independent and also how the variables cluster on the factors before rotation.
We use varimax or promax rotation. Delta values should be greater than 0 upto 0.8.

Now it is important to decide which variables make up which factors. Significance of factors
loading by most of the researchers is more than 0.3. But it depends upon the sample size. Steve
(2002) recommends factor loading which absolute value greater than 0.4 (which explains 16% of
the variance in the variable).

 Bartlett’s test:
Used to test if k samples are from population with equal variance or not (homogeneity). 30 or
more variables (communalities greater than 0.7 for all variables). Fewer than 20 variables
(communalities < 0.4).

 MANOVA:
Used to find the differences in two or more vectors of means.

 ANOVA:
Used to find the differences in means between two or more groups.

INTERPRETATION
Now we work on our new file which we saved as “aftermissingvariables”. Open that file and
perform the following steps. Before applying the following steps we will delete the label column
from the variable view so that we can clearly find the outputs:

Analyze dimension reduction factor.

In variable column transfer all your variables. Then go to descriptive and select KMO and
bartlett’s T test, coefficient and significance level press continue. Then go to extraction, method
will be principle component and check the scree plot column then press continue. Then go to
rotation and select varimax method then press continue. Then go to options and check sorted by
size, suppress small coefficients and also write absolute value (.4) and then continue.

pg. 13
Output:

KMO and Bartlett's Test


Kaiser-Meyer-Olkin Measure of Sampling Adequacy. .930
Bartlett's Test of Sphericity Approx. Chi-Square 11785.442
df 820
Sig. .000

An exploratory factor analysis was conducted by using principal component analysis (PCA) on
the 41 items with orthogonal rotation (varimax). The Kaiser–Meyer–Olkin measure verified the
sampling adequacy for the analysis, KMO = .930 (‘superb’ according to Field, 2009), and all
KMO values for individual items were > .80, which is well above the acceptable limit of .5
(Field, 2009). Bartlett’s test of sphericity χ² (820) = 11785.44, p < .001, indicated that
correlations between items were sufficiently large for PCA.

pg. 14
Point of inflexion

An initial analysis was run to obtain eigenvalues for each component in the data. Seven
components had eigenvalues over Kaiser’s criterion of 1 and in combination explained 69.6% of
the variance. The scree plot was slightly ambiguous and showed inflexions that would justify
retaining both components 5 and 7. Given the large sample size, and the convergence of the scree

pg. 15
plot and Kaiser’s criterion on seven components, this is the number of components that were
retained in the final analysis. Table rotated component matrix shows the factor loadings after
rotation. The items that cluster on the same components suggest that component 1 represents a
Useful, component 2 a playful, component 3 a computer latency and component 4 joy,
component 5 information acquired and component 6 decision quality.

Sr. no Variable KMO


1 Playful .903
2 Computer latency .801
3 A typical Use .901
4 Useful .914
5 Joy .924
6 Information Acquired .832
7 Decision Quality .899
Overall .930

C. Structural Equation Modeling:


SEM is used to describe a large number of statistical models used to evaluate the validity of
substantive theories with empirical data. Statistically it represents the extension of general linear
modeling (GEM) procedures such as the ANOVA and Multiple Regression Analysis. It is used
to find the relationship among latent constructs. Applicable to experimental and non-
experimental data as well as cross sectional and longitudinal data. It also goes by the aliases
“causal modeling” and “analysis of covariance structure”. SEM software can test traditional
models, but it also permits examination of more complex relationships and models.

The SEM can be divided into two parts.

1. The measurement model: is the part which relates measured variables to latent variables.
2. The structural model: is the part that relates latent variables to one another.

pg. 16
1. Measurement model:
Also known as confirmatory factor analysis. It is used to find how well the measured variables
represent the number of constructs. Researchers can specify the number of factors required in the
data and which measured variable is related to which latent variable. It is a tool that is used to
confirm or reject the measurement theory. Atleast four constructs and three items per construct
should be present in the research. The factor loading variable should be greater than (0.9). Chi-sq
test and goodness of fit statistics like RMR, GFI, NFI, RMSEA etc are used to check model
validity. AMOS is used for CFA. Visual paths are drawn on graphical window and analysis is
performed.

CFA assumptions:
 Data must have multivariate normality.
 Sample size must be greater or equal to 200 (n>200)
 Correct a priori model specifications.
 Data must be from random sample.

2. Structural modeling:
It involves various multiple regression models or equations that are estimated simultaneously.
This is effective technique and help to find other complex relationship among variables. Path
model sometimes called causal modeling.

This portion of the model may be identified if:

 None of the latent dependent variables predicts another latent dependent variable.
 When a latent dependent variable does predict another latent dependent variable, the
relationship is recursive, and the disturbances are not correlated.
 A relationship is recursive if the causal relationship is unidirectional.

In a non-recursive relationship there are two lines between a pair of variables, one pointing from
A to B and the other from B to A. Correlated disturbances are indicated by being connected with
a single line with arrowhead on each end. When there is a non-recursive relationship between
latent dependent variables or disturbances then there is problem.

pg. 17
How does SEM works?
Step 1: Model Specification.
A sound model is theory based. Theory is based on findings in the literature, knowledge in the
field, or one’s educated guesses, from which causes and effects among variables within the
theory are specified. Models are often easily conceptualized and communicated in graphical
forms. In these graphical forms, a directional arrow (→) is universally used to indicate a
hypothesized causal direction. The variables to which arrows are pointing are commonly termed
endogenous variables (or dependent variables) and the variables having no arrows pointing to
them are called exogenous variables (or independent variables). Unexplained co variances
among variables are indicated by curved arrows ( ). Observed variables are commonly
enclosed in rectangular boxes and latent constructs are enclosed in circular or elliptical shapes.

Step 2: Data characteristics.


Score reliability and validity should be considered in selecting measurement instruments for he
construct of interest sample size. The sample size required to provide unbiased parameter
estimates and accurate model fit information for SEM models depends on the model
characteristics such a s model size as well as score characteristics of measured variables such as
score scale and distribution.

Step 3: Model estimation.


A properly specified structural equation model often has some fixed parameters and some free
parameters to be estimated from data.

Step 4: Model evaluation.


Once model parameters have been estimated now decide either to retain or reject the
hypothesized model.

Step 5: Model modification, alternative models and equivalent models.


When hypothesized model is rejected based on goodness of fit statistics, SEM researchers are
often interested or finding an alternative model that fits the data. Model trimming also be used.
Goodness of fit is used to compare the mapping and the more it is compared with each other the
more will be goodness of fit. It is basically a test run to check the mapping between the variables.

pg. 18
D. Confirmatory Factor Analysis.
Parameters:
The parameters of the model are regression coefficients for paths between variables and
variances/covariances of independent variables. Parameters may be “fixed” to a certain value
(usually “0” or “1”) or may be estimated. In the diagram, an “ ” represents a parameter to be
estimated. A “1” indicates that the parameter has been “fixed” to value “1.” When two variables
are not connected by a path the coefficient for that path is fixed at “0.”

There are three types of parameter models:

1) Under identified model:

There is infinite number of possible solution for parameter estimates.

2) Just identified model:

There is only one possible solution for each parameter estimates.

3) Over identified model:

There is more than one possible solution for each parameter estimates.

Interpretation:
AMOS

File datafile filename aftermissingvariable (SPSS file)


ok

Then copy the rotated component matric from the SPSS file and then go to AMOS
plugin pattern matric model builder paste and then press create diagram.

pg. 19
View analyze property output and check minimization history,
standardized estimates, factor score weights and modification indices then close.

Analyze calculate estimates standardized estimates.

pg. 20
pg. 21
• Model fit:
This test is used to find the relationship between data and the model.

Three types of models are present in AMOS:

1) Default model: user defined the model in AMOS.


2) Saturated model: not specified by the user. All parameters are drawn and show
estimates availability of degree of freedom.
3) Independence model: restricted model. No parameter drawn and show estimates of the
variance of the observed variable only.

Following tests will be done to check the model fitness.

 Chi-square value must be less than 0.05. Then our alternative hypothesis will be
accepted and null will be rejected and that model fits the data.
 Tucker Levis Index, comparative fit index, the normed fit index, GFI, AGFI and
PGFI there value must be greater or equal to 0.9.
 RMSEA and RMR values must be less than 0.05.

Follow the following steps:

AMOS view text model fit.

We found out that NFI, RFI, IFI, TLI, GFI, AGFI, PGFI values are closer to 0.9 and RMR,
RMSEA values are less than 0.05. This is the maximum limit because all the indices meets the
criteria so PGFI value will be ignored.

Now we will find observed variables from the latent variables.

Analyze data imputation stochastic regression imputation


impute.

Now item work is end and now we will work on variables. New file (untitled 2-C) open then
delete all the previous items and work on new variables.

pg. 22
• Indices:

pg. 23
• Reliabilities:
Now we check the reliability of after missing variables item wise.

Playful
Reliability Statistics
N of
Cronbach's Alpha Items
.912 7

Computer latency
Reliability Statistics
N of
Cronbach's Alpha Items
.816 4

AtypUse
Reliability Statistics
N of
Cronbach's Alpha Items
.931 5

Usefulness
Reliability Statistics
N of
Cronbach's Alpha Items
.943 7

Joy
Reliability Statistics
N of
Cronbach's Alpha Items
.939 7

pg. 24
Information acquired
Reliability Statistics
N of
Cronbach's Alpha Items
.842 5

Decision quality
Reliability Statistics
N of
Cronbach's Alpha Items
.900 8

Overall
Reliability Statistics
N of
Cronbach's Alpha Items
.945 41

All the reliabilities are greater than 0.7 so, there is no issue of data consistency.

E. Feel of data:
Following four steps will be followed to check the feel of data.

Step 1: Multiple graphs.


We check those variables which we had measured by the ordinal scale. For example we will
check the number of male and female in data or we can check the education level (who did
matric and how much did masters etc). So for this purpose pie chart, histogram and bar chart are
used.

Step 2: Data’s Descriptive Analysis.


We check two things

a) Measure of central tendency: tells average behavior of data.

pg. 25
b) Measure of dispersion: tells movement of data against dispersion.

Scales Central tendency Dispersion


Nominal Mode Range
Ordinal Mode/median Interquartile
Likert Mode/median is best Median deviation
Ratio Mean Standard deviation

Step 3: Exploration of assumption.


1) Data should follow the normality distribution.
2) Independence.
3) Data should be on at least likert scale.
4) Should follow the homogeneity of variance.

• Normality:
Data should be in bell shape. It shows that data is normally distributed.

I. KURTOSIS AND SKEWNESS:

Kurtosis value must be equal to 3 (mesokurtic) and skewness value must be 0 (normal
skewness). Then we can say that data is normally distributed. If:

 K greater than 3 then it is called leptokurtic. We can say heavy-tailed distributed.


 K less than 3 then it is called platykurtic and the data is flat and no height. We can say
light-tailed distributed.
 If data is loaded on the left side then it is negative skewed. If on right side then
positive skewed.
II. Q-Q Plot and P-P Plot:

If data is on the line then normal distribution exists. If data deviate from line then the data will
non-normal.

III. NUMERICAL TEST.


a) Kolmogorov Smirnov Test.
b) Shaprio Wilk Test.

pg. 26
Their null hypothesis is that data is following the normal distribution. So if p>0.05 then null will
be accepted and if p<0.05 then null hypothesis will be rejected. Here we have to retain the null
hypothesis because the alternative hypothesis said that data is not normally distributed.

Interpretation:

SPSS graphs legacy dialog pie chart gender ok

This data shows that the number of male individuals which is 76.05% is greater than the female
individuals which is 23.95%.

SPSS analyze descriptive statistics frequencies take gender


to variable column ok.

pg. 27
Gender
Cumulative
Frequency Percent Valid Percent Percent
Valid Male 289 76.1 76.1 76.1
Female 91 23.9 23.9 100.0
Total 380 100.0 100.0

SPSS descriptive statistics descriptive select all and place them to


variable column Options kurtosis, skewness, mean, st. dev, range etc
ok.

*for example
Z= statistics of kurtosis
Std. error
= 2.23
0.25
=8.924.
*This shows that kurtosis is present as the value of kurtosis is greater than 3.
Analyze Descriptive statistics explore take information to the
dependent variable column plots uncheck stem and leaf check
normality plot with test continue ok.

pg. 28
Info Q-Q plot:

Tests of Normality

Kolmogorov-Smirnova Shapiro-Wilk

Statistic df Sig. Statistic df Sig.


info .152 380 .000 .928 380 .000

a Lilliefors Significance Correction

pg. 29
Aty Q-Q Plot:

Tests of Normality
Kolmogorov-Smirnova Shapiro-Wilk
Statistic df Sig. Statistic df Sig.
aty .152 380 .000 .928 380 .000
a. Lilliefors Significance Correction

pg. 30
Play Q-Q Plot:

Tests of Normality
Kolmogorov-Smirnova Shapiro-Wilk
Statistic df Sig. Statistic df Sig.
play .076 380 .000 .983 380 .000
a. Lilliefors Significance Correction

pg. 31
Quality Q-Q Plot:

Tests of Normality

Kolmogorov-Smirnova Shapiro-Wilk

Statistic df Sig. Statistic df Sig.

qua .100 380 .000 .949 380 .000

a. Lilliefors Significance Correction

pg. 32
Joy Q-Q Plot:

Tests of Normality

Kolmogorov-Smirnova Shapiro-Wilk

Statistic df Sig. Statistic df Sig.

joy .094 380 .000 .961 380 .000

a. Lilliefors Significance Correction

pg. 33
Useful Q-Q Plot:

Tests of Normality

Kolmogorov-Smirnova Shapiro-Wilk

Statistic df Sig. Statistic df Sig.

use .172 380 .000 .928 380 .000

a. Lilliefors Significance Correction

pg. 34
Case no. Q-Q Plot:

Tests of Normality

Kolmogorov-Smirnova Shapiro-Wilk

Statistic df Sig. Statistic df Sig.

CaseNo .058 380 .004 .955 380 .000

a. Lilliefors Significance Correction

pg. 35
From the above variable’s test of normality we concluded that all have KOLMOGOROV-
SMIRNOV TEST value p<0.05 and SHAPRIO-WILK TEST value p<0.001. This shows that
data is non normal.

• Homogeneity and variance:


We check variance when we have groups. Variance across groups should not have significant
difference. The variance of your outcome variables should be the same in each group. Leven’s
bvgtest is used to check the homogeneity.

Group Statistics

Std. Error
Gender N Mean Std. Deviation Mean

dec Male 289 4.0160 .60583 .03564

Female 91 4.0431 .53695 .05629

H0: decision quality of male and female are similar.

H1: decision quality of male and female are different.

Analyze compare mean independent sample t- test

 Test value: decision quality.


 Grouping value: gender.
 Define groups: As 1 and 2.

pg. 36
Output

Our significant value is 0.248 which is greater than 0.05. This tells that our value is insignificant.
We take the equal variance assumed value which is -0.381. This value is less than the 1.96. We
conclude that we have to accept our null hypothesis and reject alternative hypothesis.

If normality and homogeneity not present then we apply Mann-Whitney test.

Analyze non-parametric test legacy dialogues 2-ind samples test


Mann-Whitney test.

Test Statisticsa

pg. 37
dec

Mann-Whitney U 12920.000

Wilcoxon W 17106.000

Z -.251

Asymp. Sig. (2-tailed) .802

a. Grouping Variable: Gender

0.802 shows the insignificant data because its value is less than 1.96.

• Independence:
Auto correlation and scatter diagram will be used to check that either the individual fill the
questionnaire by his opinions or not or he copy someone else answer.

Step 4: Correlation Analysis.


It tells us about the relationship between two variables or variants. Two things must be kept in
mind.

 Strength.

 Direction.
R is the correlation coefficient. Its value must be between +1 to -1. If:

 R is from 0.5 to 0.7 (semi strong).


 R is from 0.1 to 0.5 (weak).
 R is from 0.7 to 1 (strong). Variation and behavior will be similar in variable. For
example information acquired must have perfect relationship with information acquired
and so on.

pg. 38
 Types of correlation:
1) Pearson correlation: presence of independence, homogeneity, normality etc then we
apply this test. It is a parametric test.
2) Spareman rho correlation: non-parametric test.
3) Kandall tau’B correlation: non-parametric test used when data size is small and ties
will be higher.
4) Partial correlation: this test is used to find the relationship between two variables by
controlling the effect of all other variables.

INTERPRETATION
Bivariate correlation:

SPSS Analyze Correlate Bivariate correlation select info


and comp latency Pearson Continue.

Take two variables information seeking and computer latency. The null hypothesis says “there
is no relationship between information seeking and computer latency”. We have to reject the
null hypothesis and accept the alternative.

Correlations

comp info
comp Pearson Correlation 1 .338**
Sig. (2-tailed) .000

N 380 380
info Pearson Correlation .338** 1
Sig. (2-tailed) .000

N 380 380
**. Correlation is significant at the 0.01 level (2-tailed).

Each variable is perfectly correlated with itself (obviously) and so r = 1 along the diagonal of the
table. The output shows that information seeking with Pearson Correlation value is r= .338
which shows there is positive relationship between information seeking and computer latency.

pg. 39
The strength of relationship is .338 which is statistically significantly different from 0 where
p<0.05.

Partial Correlation:

SPSS Analyze Correlate Partial

The space labelled Variables is for listing the variables that you want to correlate and the space
labelled Controlling for is for declaring any variables the effects of which you want to control.
In the example I have described, we want to look at the unique effect of information on decision
and so we want to correlate the variables decision and information while controlling all other
variables. If you click on options then another dialog box appears as shown in Figure. If you
haven’t conducted bivariate correlations before the partial correlation then this is a useful way to
compare the correlations that haven’t been controlled against those that have. This comparison
gives you some insight into the contribution of different variables.

pg. 40
Output:

Correlations
Control Variables info dec
Gender & Age & Education info Correlation 1.000 .786
& Frequency & Experience Significance (2-tailed) . .000
& comp & aty & play & joy & df 0 364
use & CaseNo
dec Correlation .786 1.000
Significance (2-tailed) .000 .
df 364 0

.786 shows that there is partial correlation between info and dec.

F. Difference of means analysis/ Variance analysis:


We calculate average of both groups and find or analyze difference of means. For example we
take two groups A and B. For group A we give them medicines to treat their illness and to
second group B we does not provide them any medicines. Then we calculate the difference occur
due to medicine usage and not medicine usage.

There are two types of groups independent and dependent. In independent group we take 2
groups and 2 different treatment will be done and if data is parametric we use independent t-test
or for non parametric we use Mann-Whitney test. If we take dependent group then in 1 group we
will do 2 treatments and if our data is parametric we perform dependent t-test or else we perform

pg. 41
Willcoxon Sign Rank test. If we have more than 2 groups i.e 3,4,5 etc and our data is parametric
then we perform One Way ANOVA or else in case of non parametric data we perform Fridman
ANOVA.

Parametric Test Example:


H0: Decision quality of male and female are similar.

H1: Decision quality of male and female differs.

If p > 0.05 and t < 1.96 then Null hypothesis will be accepted. We run our assumptions on SPSS.
Open the composite file and perform the following steps:

Analyze compare means independent t test pop up will


appear.

Take decision quality as test variable and gender as a grouping variable.

Define the groups as mention in the following figure.

pg. 42
On the basis of variance, standard deviation, Mean there is slightly difference between both
decision quality of male and female. Our sig value is .248 which is p >0.05. so it is insignificant
and our variance is equal. Our t- value is -.381 which is less than 1.96 the given criteria to accept
the null hypothesis (we don’t focus on the positive or negative sign, we see in absolute form). So,
null will be accepted and alternate will be rejected. There is similarity between decision quality
of male and female.

If levene’s test is insignificant it means there is homogeneity between the variance. Levene’s test
null hypothesis says that there exist homogeneity of variance between groups.

Non-Parametric Test Example:

H0: Decision Quality of male and female are similar.

H1: Decision Quality of male and female are not similar.

pg. 43
Analyze Non parametric test legacy dialogue 2-ind samples
test mann whitney test continue

Ranks
Gender N Mean Rank Sum of Ranks
dec Male 289 191.29 55284.00
Female 91 187.98 17106.00
Total 380

Test Statisticsa

dec
Mann-Whitney U 12920.000
Wilcoxon W 17106.000
Z -.251
Asymp. Sig. (2-tailed) .802
a. Grouping Variable: Gender

pg. 44
The output shows that there are 289 male and 91 females and the total cases are 380. Z and T
values represents the same thing. We have Z= -.251 and p= .802. so we have insignificant value
of P and Z value is less than 1.96.

Testing for homogeneity of variance:

 For group data H0: variance in different groups are equal.


 One-Way ANOVA is used. (each score-means of the groups)
 Levene’s test significant (p<=0.05) Null hypothesis rejected.
 If in significant (i.e p>=0.05) Null hypothesis accepted.

G. Multivariate regression:
 Regression: It is used to measure the relationship between the mean value of one
variable corresponding to values of other variables.
 Regression analysis: it is a way to find the outcome variable from a predictor variable.
If more than one predictor variable used then it is called multiple regression analysis.
 Regression coefficient: Any straight line can be defined by two things: (1) the slope
(or gradient) of the line (usually denoted by b1); and (2) the point at which the line
crosses the vertical axis of the graph (known as the intercept of the line, b0). These
parameters b1 and b0 are known as the regression coefficients. A line that has a gradient
with a positive value describes a positive relationship, whereas a line with a negative
gradient describes a negative relationship.
 Residuals: we are interested in the vertical differences between the line and the actual
data because the line is our model: we use it to predict values of Y from values of the X
variable. In regression these differences are usually called residuals rather than
deviations, but they are the same thing.
 Assessing the goodness of fit: sum of squares: R and R2: Once Nephwick the
Line Finder has found the line of best fit it is important that we assess how well this line
fits the actual data (we assess the goodness of fit of the model).
 Conditions to imply regression analysis:
1. There should be relationship between IV and DV.
2. The relationship should not be spurious. Must be based on theory.

pg. 45
3. The cause should follow the effect in time.

 The assumptions for better results of regression:


1. The relationship between IV and DV should be linear.
2. X should be a variable.
3. X should be non-stochastic and fixed in repeated sample. Must be outcome of some
process and wherever repeat on any level the results should be same. For example the
procedure to calculate CGPA is same for all.
4. There should be no relationship between IVs (Multi collinearity).
5. The spread of residual should be constant (homo skadascity).
6. Error term/residual should be serial independence. (Auto correlation if not fulfil this
condition).
7. There should be no outliers in the residual.

• Regression Plots:
Checking Assumptions:
As a final stage in the analysis, you should check the assumptions of the model. We have already
looked for collinearity within the data and used Durbin–Watson to check whether the residuals in
the model are independent. A plot of *ZRESID against *ZPRED and for a histogram and normal
probability plot of the residuals are shown. The graph of *ZRESID and *ZPRED should look
like a random array of dots evenly dispersed around zero. If this graph funnels out, then the
chances are that there is heteroscedasticity in the data. If there is any sort of curve in this graph
then the chances are that the data have broken the assumption of linearity. Following is the
example of different outputs results.

pg. 46
• Multi collinearity tests:
To check the multi collinearity 2 tests are done:
1. VIF (Variance Inflation Factor): Its value should be < and equal to tolerable limit if high
then there is an issue.
2. Tolerance: Its value should be higher or equal to 0.1.

 Simple regression on SPSS:


Analyze regression linear

Dep variable (decision quality) and independent variables (all remaining variables)
statistics Plots Save (check unstandardized from residual column) and then
press OK.

pg. 47
pg. 48
 Estimates: This option is selected by default because it gives us the estimated
coefficients of the regression model (i.e. the estimated b-values). Test statistics and their
significance are produced for each regression coefficient: a t-test is used to see whether
each b differs significantly from zero.
 Model fit: This option is vital and so is selected by default. It provides not only a
statistical test of the model’s ability to predict the outcome variable, but also the value of
R (or multiple R), the corresponding R2 and the adjusted R2.
 Collinearity diagnostics: This option is for obtaining collinearity statistics such as the
VIF, tolerance, eigenvalues of the scaled, not centered cross-products matrix, condition
indexes and variance proportions.
 Durbin-Watson: This option produces the Durbin–Watson test statistic, which tests the
assumption of independent errors. Unfortunately, SPSS does not provide the significance
value of this test, so you must decide for yourself whether the value is different enough
from 2 to be cause for concern.

pg. 49
 DEPENDNT (the outcome variable).
 *ZPRED (the standardized predicted values of the dependent variable based on the
model). These values are standardized forms of the values predicted by the model.
 *ZRESID (the standardized residuals, or errors). These values are the standardized
differences between the observed data and the values that the model predicts).

pg. 50
Output

Model Summaryb

Adjusted R Std. Error of the


Model R R Square Square Estimate Durbin-Watson

1 .891a .794 .788 .26857 2.058

a. Predictors: (Constant), use, Education, Gender, Experience, aty, Frequency, play, Age,
comp, joy, info
b. Dependent Variable: dec

The value of R2 here tells us about how much the independent variable explained variation in
dependent variable. In above table R2= .794 means 79.4% of variation in Dec Quality are
explained by Usefulness, Atypical use, joyful, playful, computer latency, information acquired
and other variations due to unknown variables which are not part of this model.
 R2 = .5 or 50% is better.

The Durbin Watson values and its interpretation:


 If DW is from 0 to 1.5 (positive auto correlation)
 If 2.5 to 4 (negative auto correlation)
 If from 1.5 to 2.5 (no auto correlation)
 If 2 then its zero auto correlation

In the above table we have DW= 2.058 (no auto correlation)

ANOVAa

Model Sum of Squares df Mean Square F Sig.

1 Regression 104.791 6 17.465 242.052 .000b

Residual 26.914 373 .072

Total 131.705 379

a. Dependent Variable: dec


b. Predictors: (Constant), use, aty, play, comp, info, joy

The ANOVA tells us whether the model, overall, results in a significantly good degree of
prediction of the outcome variable. However, the ANOVA doesn’t tell us about the individual
contribution of variables in the model.

pg. 51
F is 242.052 which is significant at p<0.05 (because the value in the column labelled Sig. is less
than .001). Therefore, we can conclude that our regression model results in significantly better
prediction of decision quality than if we used the mean value of decision quality. In short, the
regression model overall predicts decision quality significantly well.

It violates the assumption of homoscedasticity. Note that the points form the shape of a funnel so
they become more spread out across the graph. This funnel shape is typical of heteroscedasticity
and indicates increasing variance across the residuals.

pg. 52
The histogram should look like a normal distribution (a bell-shaped curve). SPSS draws a curve
on the histogram to show the shape of the distribution.

Tests of Normality

Kolmogorov-Smirnova Shapiro-Wilk

Statistic df Sig. Statistic df Sig.

Unstandardized Residual .110 380 .000 .965 380 .000

a. Lilliefors Significance Correction

pg. 53
The straight line in this plot represents a normal distribution, and the points represent the
observed residuals. Therefore, in a perfectly normally distributed data set, all points will lie on
the line. In our output the residual points slightly deviate from the straight line.

pg. 54
Coefficients

Unstandardized Standardized
Coefficients Coefficients Collinearity Statistics

Model B Std. Error Beta t Sig. Tolerance VIF

1 (Constant) -.058 .112 -.515 .607

comp -.079 .027 -.085 -2.968 .003 .667 1.500

info 1.121 .045 .820 25.055 .000 .512 1.953

aty .059 .023 .072 2.587 .010 .708 1.412

play -.036 .025 -.046 -1.424 .155 .525 1.905

joy .006 .029 .006 .191 .849 .511 1.957

use .138 .030 .144 4.564 .000 .550 1.819

a. Dependent Variable: dec

T represents the slope. All VIF values are less than 5 and all values for tolerance are less than
0.9.

INTERPRETATION

A multiple linear regression model was conducted to determine if DECSION QUALITY


(dependent variable) could be predicted from Usefulness, Joyful, Computer latency, A typical
Use and Information acquired (independent variables). The null hypothesis tested were that the
multiple R2 was equal to 0 and that the regression coefficients (i.e., the slopes) were equal to 0.
The data is already screened and there is no missing data.
 Linearity: The review of the partial scatterplot of the independent variables and the
dependent variable indicates linearity is not a reasonable assumption. Additionally, with a
random display of points falling within an absolute values provided further evidence of
linearity.
 Normality: The assumption of normality was tested via examination of the
unstandardized residuals. Review of the S-W test for normality (SW= 0.965, df= 380, p=
0.000) and skewness (-.336) and kurtosis (.484) statistics suggested that normality was
not a reasonable assumption. But with the large data size where n=380 this non-normality

pg. 55
is not too important. The boxplot suggested a relatively normal distributional shape (with
no outlier) of the residuals.
 Independence: A relatively random display of points in the scatterplots of standardized
residuals against values of the independent variables and standardized residuals against
predicted values provided evidence of independence. The Durbin-Watson statistic was
computed to evaluate independence of errors and was 2.058, which is considered
acceptable. This suggest that the assumption of independent errors has been met.
 Multi Collinearity: Tolerance was greater than .10 (.511) and the variance inflation
factor was less than 5 (1.957), suggesting that multi collinearity was not an issue.
 Assumption Results: The result of multiple linear regression suggest that a significant
proportion of the total variation in Decision quality was predicted by independent
variables, F (6,380) = 242.052, p < 0.000. Additionally, we find the following:
1. For Computer latency the unstandardized partial slope (-0.79) and standardized partial
slope (-0.85) are statistically significantly different from 0 (t = -2.968, df = 380, p , 0.05);
with every one point increase in computer latency, decision quality will decrease by
approximately (-0.79) of one point when controlling for information acquired.
2. For information acquired the unstandardized partial slope (1.121) and standardized partial
slope (0.802) are statistically significantly different from 0 (t = 25.055, df = 380, p <
0.05) for every one unit change in information acquired decision quality will also change
by approximately (1.121) units.
3. For Atypical Use the unstandardized partial slope (0.059) and standardized partial slope
(0.072) are statistically significantly different from 0 (t = 2.587, df = 380, p < 0.05) for
every one unit change in Atypical Use will also change one unit of decision quality by
approximately (0.059) units.
4. For playful the standardized partial slope (-0.036) and standardized partial slope (-0.046)
are statistically significantly different from 0 (t = -1.141, df = 380, p < 0.05) for every
one unit increase in playful will decrease the one unit of decision quality by
approximately (-0.036) units.
5. For joyful the standardized partial slope (0.006) and standardized partial slope (0.006) are
statistically significantly different from 0 (t = 0.191, df = 380, p < 0.05) for every one

pg. 56
unit increase in playful will increase the one unit of decision quality by approximately (-
0.006) units.
6. For usefulness the standardized partial slope (0.138) and standardized partial slope
(0.144) are statistically significantly different from 0 (t = 4.564, df = 380, p < 0.05) for
every one unit increase in playful will increase the one unit of decision quality by
approximately (0.138) units.
7. The intercept was insignificant (-0.58) not statistically significantly different from (t = -
0.515, df = 380, p < 0.05).
8. Multiple R2 indicated that approximately 79.4% of the variation in decision quality was
predicted by all the independent variables and remaining factors are not part of this model
according to Cohen (1988), this suggests a large effect.

pg. 57

You might also like