Professional Documents
Culture Documents
pg. 1
Factor loadings: ............................................................................................................................. 12
How many factors should be retain? ................................................................................................... 12
Factor rotation: ............................................................................................................................. 12
Orthogonal and oblique rotation:................................................................................................ 13
Bartlett’s test: ................................................................................................................................ 13
MANOVA: ..................................................................................................................................... 13
ANOVA:......................................................................................................................................... 13
INTERPRETATION ................................................................................................................................ 13
Output: ............................................................................................................................................... 14
C. Structural Equation Modeling:........................................................................................................ 16
1. Measurement model: .................................................................................................................... 17
CFA assumptions: ............................................................................................................................. 17
2. Structural modeling: ..................................................................................................................... 17
How does SEM works? ............................................................................................................................. 18
Step 1: Model Specification. ................................................................................................................. 18
Step 2: Data characteristics.................................................................................................................. 18
Step 3: Model estimation. ..................................................................................................................... 18
Step 4: Model evaluation. ..................................................................................................................... 18
Step 5: Model modification, alternative models and equivalent models. ......................................... 18
D. Confirmatory Factor Analysis. ........................................................................................................ 19
Parameters: ........................................................................................................................................... 19
Interpretation: ........................................................................................................................................... 19
• Model fit:........................................................................................................................................ 22
• Indices: ........................................................................................................................................... 23
• Reliabilities: ................................................................................................................................... 24
E. Feel of data: ....................................................................................................................................... 25
Step 1: Multiple graphs. ....................................................................................................................... 25
Step 2: Data’s Descriptive Analysis. .................................................................................................... 25
Step 3: Exploration of assumption. ..................................................................................................... 26
• Normality:.................................................................................................................................. 26
Info Q-Q plot: .................................................................................................................................... 29
Aty Q-Q Plot:..................................................................................................................................... 30
Play Q-Q Plot: ................................................................................................................................... 31
pg. 2
Quality Q-Q Plot: .............................................................................................................................. 32
Joy Q-Q Plot: ..................................................................................................................................... 33
Useful Q-Q Plot: ................................................................................................................................ 34
Case no. Q-Q Plot: ............................................................................................................................ 35
• Homogeneity and variance:...................................................................................................... 36
Output ................................................................................................................................................. 37
• Independence: ........................................................................................................................... 38
Step 4: Correlation Analysis. ............................................................................................................... 38
Strength. ...................................................................................................................................... 38
Direction. .................................................................................................................................... 38
Types of correlation: ......................................................................................................................... 39
INTERPRETATION ................................................................................................................................ 39
Bivariate correlation:............................................................................................................................ 39
Partial Correlation: .............................................................................................................................. 40
F. Difference of means analysis/ Variance analysis: ........................................................................... 41
Parametric Test Example: ....................................................................................................................... 42
Non-Parametric Test Example: ............................................................................................................... 43
G. Multivariate regression: ............................................................................................................... 45
Regression:..................................................................................................................................... 45
Regression analysis: ...................................................................................................................... 45
Regression coefficient: .................................................................................................................. 45
Residuals: ....................................................................................................................................... 45
Assessing the goodness of fit: sum of squares: R and R2: .......................................................... 45
Conditions to imply regression analysis: .................................................................................... 45
The assumptions for better results of regression: ...................................................................... 46
• Regression Plots: ........................................................................................................................... 46
• Multi collinearity tests: ................................................................................................................. 47
Simple regression on SPSS: .......................................................................................................... 47
Output ................................................................................................................................................ 51
pg. 3
A. Multivariate Data Analysis:
Variate means combination of some variables in a linear relationship. Data is the collection of
variables. So multivariate analysis involves observation and analysis of more than one statistical
outcome variable at a time.
Data:
Data is basically facts and figures which are used to perform the further analysis.
Types of data
There are three types of data:
2. Cross-sectional data:
We collect data in single time from multiple areas or cross sections. For example ratio
analysis of all countries, find motivation or satisfaction level of employees etc.
3. Panel/longitudinal data:
We collect data in multiple time series from multiple cross sections. For example we
want to calculate GDP or GNP of 5 countries from the year 2001 to 2005, so we take data
in multiple time from multiple areas.
Measurement scales:
Identification and measurement of variation in variables is necessary to measure. Researchers
cannot find variation unless the variable can be measured. Data can be classified into two
categories
pg. 4
2) Metric (quantitative): This type of data includes information that is ranked (the researcher
know the order but not the difference between the values) and information that has no linear
pattern to it (male or female).
Non metric measurement scale has two types ordinal and nominal.
Ordinal scale: is used to label the subjects or objects to identify them easily e.g value 1 is assign
to male and value 2 is assign to female.
Nominal scale: it provide no measure of the actual amount or magnitude in absolute terms, only
the order of the value e.g likert scale which consist of different levels like strongly
agree=1,agree=2,normal=3,disagree=4,strongly disagree=5.
Ratio scale: allows any researcher to compare the intervals or differences and possesses a zero
point or character of origin.
For example:
Below 20 years
21-30 years
31-40 years
41 years or above.
1) Nonmetric data are not incorrectly used as metric data and vice versa.
2) Identify which multivariate technique will be most applicable to the data.
pg. 5
Measurement error and multivariate measurement:
Two important characteristics of a measure are validity and reliability.
Validity: is the degree to accurately measure the data which it is supposed to. Validity has
further three types:
Reliability: is a degree to which the observed variables measures the true value and error free.
It has further four types:
1. Test-retest Reliability.
2. Equivalent-Forms or Alternate-Forms Reliability.
3. Split-Half Reliability.
4. Rationale Equivalence Reliability.
• Type II error, or : is the probability of failing to reject the null hypothesis when it is
false.
• Power, or 1- : is the probability of rejecting the null hypothesis when it is false.
H0 true H0 false
Fail to 1-
Reject H0
pg. 6
Type II
error
Reject H0 1-
pg. 7
Logit/Logistic Regression: A single nonmetric dependent variable is predicted by
several metric independent variables. This technique is similar to discriminant analysis,
but relies on calculations more like regression.
Multivariate Analysis of Variance (MANOVA) and Covariance: Several metric
dependent variables are predicted by a set of nonmetric (categorical) independent
variables.
Conjoint Analysis: is used to understand respondents’ preferences for products and
services In doing this, it determines the importance of both attributes and levels of
attributes based on a smaller subset of combinations of attributes and levels.
Canonical Correlation: Several metric dependent variables are predicted by several
metric independent variables.
Structural Equations Modeling (SEM):
Interdependence techniques: involve the simultaneous analysis of all variables in the set,
without distinction between dependent variables and independent variables.
Principal Components and Common Factor Analysis: analyzes the structure of the
interrelationships among a large number of variables to determine a set of common
underlying dimensions (factors).
Cluster Analysis: groups objects (respondents, products, firms, variables, etc.) so that
each object is similar to the other objects in the cluster and different from objects in all
the other clusters.
Multidimensional Scaling (perceptual mapping): identifies “unrecognized”
dimensions.
Correspondence Analysis: uses non-metric data and evaluates either linear or non-linear
relationships in an effort to develop a perceptual map representing the association
between objects (firms, products, etc.) and a set of descriptive characteristics of the
objects.
Stage 1: Define the Research Problem, Objectives, and Multivariate Technique(s) to be used
pg. 8
Stage 3: Evaluate the Assumptions Underlying the
Multivariate Technique(s)
INTERPRETATION
CASE SCREENING:
We did screening of data so that all the missing variables in the data are removed, find the
unengaged responses and the outliers in the data. We find if any cases or rows or people didn’t
respond we remove them from our data.
After this make new column for unengaged variables. A person who respond with same value for
every single question. Make a new column for unengaged variables. It is find by the formula
(=stdev,P) and then copy all the data from the left which include quantitative data response and
press enter. You have now the value of unengaged variable. To find unengaged variables for all
the responses move the cursor to the first unengaged response the mouse shape will convert to
plus sign then do double tap and all the results will appear automatically. After this we sort data
from A to Z.
pg. 9
Open SPSS then follow the following steps:
Then a window appears. Select all the variables except ID and Expact. In output tick all the
options and type 0 in the minimum percentage of missing variable and then press OK. We can
impute the data because not much missing variables found in the output.
A window will appear select all those variables which calculated with likert scale or higher scale
e.g independent and dependent variables etc in quantitative column and those who measured
with ordinal and nominal scale will go to the categorical column. Then on the right side select
EM. A window will appear select student’s t and write degree of freedom 5. Save the completed
data and write the file name without space (dataaftermissingvariables) then continue.
Output:
At the bottom of univariate statistic table the value of sig should be less than 0.05. My value is
0.145 so data has no pattern. Our null hypothesis said that data is missing completely of random.
We want it to accept when significant is higher. So, p=0.145 null will be accepted and alternative
will be rejected.
EM estimated statistics:
Outliers:
An outlier is an observation point that is distant from other observations. Outliers can cause your
model to be biased because they affect the values of the estimated regression coefficients.
A new window will appear select all shift to dependent list. Then go to statistics and only select
outliers then continue.
pg. 10
Output:
We see that in stem and leaf plot those values which have steric * sign defines that these
respondent age is greater than the other respondents. If that variable matters according to our
theory then we delete that record otherwise we will continue. Although we found two cases ID
no. 378 and 380 which have higher or more in age then all the other case but our study is not
sensitive so we will continue with it. Exclusion of these cases will not affect our data. So we
ignore.
pg. 11
2) Make questionnaire that measure the underlying variables.
3) To reduce data set and retaining as much of original information as possible.
Factor analysis achieve parsimony by explaining the maximum amount of common variance in a
correlation matrix using the smallest number of explanatory construct.
We reduce the data by finding or looking for variables that correlate highly with a group of other
variables but don’t correlate with outside of that group. This process is called cluster.
Factor loadings:
Factor loadings are part of the outcome from factor analysis, which serves as a data reduction
method designed to explain the correlations between observed variables using a smaller number
of factors. Two types of coefficients:
Our factors have to show 50% variance and eigenvalue must be greater than 1.
Scree plot:
In multivariate statistics, a scree plot is a line plot of the eigenvalues of factors or principal
components in an analysis. The scree plot is used to determine the number of factors to retain
in an exploratory factor analysis or principal components to keep in a principal component
analysis.
Factor rotation:
Once factors have been extracted we will calculate degree of variable load on to these factors.
Generally, we find most variables have high loading on the most important factors and small
loading on all other factors. This makes interpretation difficult. So, factor rotation technique will
be used.
pg. 12
Orthogonal and oblique rotation:
Orthogonal and oblique rotation depends upon theoretical reason to suppose that the factors
should be related or independent and also how the variables cluster on the factors before rotation.
We use varimax or promax rotation. Delta values should be greater than 0 upto 0.8.
Now it is important to decide which variables make up which factors. Significance of factors
loading by most of the researchers is more than 0.3. But it depends upon the sample size. Steve
(2002) recommends factor loading which absolute value greater than 0.4 (which explains 16% of
the variance in the variable).
Bartlett’s test:
Used to test if k samples are from population with equal variance or not (homogeneity). 30 or
more variables (communalities greater than 0.7 for all variables). Fewer than 20 variables
(communalities < 0.4).
MANOVA:
Used to find the differences in two or more vectors of means.
ANOVA:
Used to find the differences in means between two or more groups.
INTERPRETATION
Now we work on our new file which we saved as “aftermissingvariables”. Open that file and
perform the following steps. Before applying the following steps we will delete the label column
from the variable view so that we can clearly find the outputs:
In variable column transfer all your variables. Then go to descriptive and select KMO and
bartlett’s T test, coefficient and significance level press continue. Then go to extraction, method
will be principle component and check the scree plot column then press continue. Then go to
rotation and select varimax method then press continue. Then go to options and check sorted by
size, suppress small coefficients and also write absolute value (.4) and then continue.
pg. 13
Output:
An exploratory factor analysis was conducted by using principal component analysis (PCA) on
the 41 items with orthogonal rotation (varimax). The Kaiser–Meyer–Olkin measure verified the
sampling adequacy for the analysis, KMO = .930 (‘superb’ according to Field, 2009), and all
KMO values for individual items were > .80, which is well above the acceptable limit of .5
(Field, 2009). Bartlett’s test of sphericity χ² (820) = 11785.44, p < .001, indicated that
correlations between items were sufficiently large for PCA.
pg. 14
Point of inflexion
An initial analysis was run to obtain eigenvalues for each component in the data. Seven
components had eigenvalues over Kaiser’s criterion of 1 and in combination explained 69.6% of
the variance. The scree plot was slightly ambiguous and showed inflexions that would justify
retaining both components 5 and 7. Given the large sample size, and the convergence of the scree
pg. 15
plot and Kaiser’s criterion on seven components, this is the number of components that were
retained in the final analysis. Table rotated component matrix shows the factor loadings after
rotation. The items that cluster on the same components suggest that component 1 represents a
Useful, component 2 a playful, component 3 a computer latency and component 4 joy,
component 5 information acquired and component 6 decision quality.
1. The measurement model: is the part which relates measured variables to latent variables.
2. The structural model: is the part that relates latent variables to one another.
pg. 16
1. Measurement model:
Also known as confirmatory factor analysis. It is used to find how well the measured variables
represent the number of constructs. Researchers can specify the number of factors required in the
data and which measured variable is related to which latent variable. It is a tool that is used to
confirm or reject the measurement theory. Atleast four constructs and three items per construct
should be present in the research. The factor loading variable should be greater than (0.9). Chi-sq
test and goodness of fit statistics like RMR, GFI, NFI, RMSEA etc are used to check model
validity. AMOS is used for CFA. Visual paths are drawn on graphical window and analysis is
performed.
CFA assumptions:
Data must have multivariate normality.
Sample size must be greater or equal to 200 (n>200)
Correct a priori model specifications.
Data must be from random sample.
2. Structural modeling:
It involves various multiple regression models or equations that are estimated simultaneously.
This is effective technique and help to find other complex relationship among variables. Path
model sometimes called causal modeling.
None of the latent dependent variables predicts another latent dependent variable.
When a latent dependent variable does predict another latent dependent variable, the
relationship is recursive, and the disturbances are not correlated.
A relationship is recursive if the causal relationship is unidirectional.
In a non-recursive relationship there are two lines between a pair of variables, one pointing from
A to B and the other from B to A. Correlated disturbances are indicated by being connected with
a single line with arrowhead on each end. When there is a non-recursive relationship between
latent dependent variables or disturbances then there is problem.
pg. 17
How does SEM works?
Step 1: Model Specification.
A sound model is theory based. Theory is based on findings in the literature, knowledge in the
field, or one’s educated guesses, from which causes and effects among variables within the
theory are specified. Models are often easily conceptualized and communicated in graphical
forms. In these graphical forms, a directional arrow (→) is universally used to indicate a
hypothesized causal direction. The variables to which arrows are pointing are commonly termed
endogenous variables (or dependent variables) and the variables having no arrows pointing to
them are called exogenous variables (or independent variables). Unexplained co variances
among variables are indicated by curved arrows ( ). Observed variables are commonly
enclosed in rectangular boxes and latent constructs are enclosed in circular or elliptical shapes.
pg. 18
D. Confirmatory Factor Analysis.
Parameters:
The parameters of the model are regression coefficients for paths between variables and
variances/covariances of independent variables. Parameters may be “fixed” to a certain value
(usually “0” or “1”) or may be estimated. In the diagram, an “ ” represents a parameter to be
estimated. A “1” indicates that the parameter has been “fixed” to value “1.” When two variables
are not connected by a path the coefficient for that path is fixed at “0.”
There is more than one possible solution for each parameter estimates.
Interpretation:
AMOS
Then copy the rotated component matric from the SPSS file and then go to AMOS
plugin pattern matric model builder paste and then press create diagram.
pg. 19
View analyze property output and check minimization history,
standardized estimates, factor score weights and modification indices then close.
pg. 20
pg. 21
• Model fit:
This test is used to find the relationship between data and the model.
Chi-square value must be less than 0.05. Then our alternative hypothesis will be
accepted and null will be rejected and that model fits the data.
Tucker Levis Index, comparative fit index, the normed fit index, GFI, AGFI and
PGFI there value must be greater or equal to 0.9.
RMSEA and RMR values must be less than 0.05.
We found out that NFI, RFI, IFI, TLI, GFI, AGFI, PGFI values are closer to 0.9 and RMR,
RMSEA values are less than 0.05. This is the maximum limit because all the indices meets the
criteria so PGFI value will be ignored.
Now item work is end and now we will work on variables. New file (untitled 2-C) open then
delete all the previous items and work on new variables.
pg. 22
• Indices:
pg. 23
• Reliabilities:
Now we check the reliability of after missing variables item wise.
Playful
Reliability Statistics
N of
Cronbach's Alpha Items
.912 7
Computer latency
Reliability Statistics
N of
Cronbach's Alpha Items
.816 4
AtypUse
Reliability Statistics
N of
Cronbach's Alpha Items
.931 5
Usefulness
Reliability Statistics
N of
Cronbach's Alpha Items
.943 7
Joy
Reliability Statistics
N of
Cronbach's Alpha Items
.939 7
pg. 24
Information acquired
Reliability Statistics
N of
Cronbach's Alpha Items
.842 5
Decision quality
Reliability Statistics
N of
Cronbach's Alpha Items
.900 8
Overall
Reliability Statistics
N of
Cronbach's Alpha Items
.945 41
All the reliabilities are greater than 0.7 so, there is no issue of data consistency.
E. Feel of data:
Following four steps will be followed to check the feel of data.
pg. 25
b) Measure of dispersion: tells movement of data against dispersion.
• Normality:
Data should be in bell shape. It shows that data is normally distributed.
Kurtosis value must be equal to 3 (mesokurtic) and skewness value must be 0 (normal
skewness). Then we can say that data is normally distributed. If:
If data is on the line then normal distribution exists. If data deviate from line then the data will
non-normal.
pg. 26
Their null hypothesis is that data is following the normal distribution. So if p>0.05 then null will
be accepted and if p<0.05 then null hypothesis will be rejected. Here we have to retain the null
hypothesis because the alternative hypothesis said that data is not normally distributed.
Interpretation:
This data shows that the number of male individuals which is 76.05% is greater than the female
individuals which is 23.95%.
pg. 27
Gender
Cumulative
Frequency Percent Valid Percent Percent
Valid Male 289 76.1 76.1 76.1
Female 91 23.9 23.9 100.0
Total 380 100.0 100.0
*for example
Z= statistics of kurtosis
Std. error
= 2.23
0.25
=8.924.
*This shows that kurtosis is present as the value of kurtosis is greater than 3.
Analyze Descriptive statistics explore take information to the
dependent variable column plots uncheck stem and leaf check
normality plot with test continue ok.
pg. 28
Info Q-Q plot:
Tests of Normality
Kolmogorov-Smirnova Shapiro-Wilk
pg. 29
Aty Q-Q Plot:
Tests of Normality
Kolmogorov-Smirnova Shapiro-Wilk
Statistic df Sig. Statistic df Sig.
aty .152 380 .000 .928 380 .000
a. Lilliefors Significance Correction
pg. 30
Play Q-Q Plot:
Tests of Normality
Kolmogorov-Smirnova Shapiro-Wilk
Statistic df Sig. Statistic df Sig.
play .076 380 .000 .983 380 .000
a. Lilliefors Significance Correction
pg. 31
Quality Q-Q Plot:
Tests of Normality
Kolmogorov-Smirnova Shapiro-Wilk
pg. 32
Joy Q-Q Plot:
Tests of Normality
Kolmogorov-Smirnova Shapiro-Wilk
pg. 33
Useful Q-Q Plot:
Tests of Normality
Kolmogorov-Smirnova Shapiro-Wilk
pg. 34
Case no. Q-Q Plot:
Tests of Normality
Kolmogorov-Smirnova Shapiro-Wilk
pg. 35
From the above variable’s test of normality we concluded that all have KOLMOGOROV-
SMIRNOV TEST value p<0.05 and SHAPRIO-WILK TEST value p<0.001. This shows that
data is non normal.
Group Statistics
Std. Error
Gender N Mean Std. Deviation Mean
pg. 36
Output
Our significant value is 0.248 which is greater than 0.05. This tells that our value is insignificant.
We take the equal variance assumed value which is -0.381. This value is less than the 1.96. We
conclude that we have to accept our null hypothesis and reject alternative hypothesis.
Test Statisticsa
pg. 37
dec
Mann-Whitney U 12920.000
Wilcoxon W 17106.000
Z -.251
0.802 shows the insignificant data because its value is less than 1.96.
• Independence:
Auto correlation and scatter diagram will be used to check that either the individual fill the
questionnaire by his opinions or not or he copy someone else answer.
Strength.
Direction.
R is the correlation coefficient. Its value must be between +1 to -1. If:
pg. 38
Types of correlation:
1) Pearson correlation: presence of independence, homogeneity, normality etc then we
apply this test. It is a parametric test.
2) Spareman rho correlation: non-parametric test.
3) Kandall tau’B correlation: non-parametric test used when data size is small and ties
will be higher.
4) Partial correlation: this test is used to find the relationship between two variables by
controlling the effect of all other variables.
INTERPRETATION
Bivariate correlation:
Take two variables information seeking and computer latency. The null hypothesis says “there
is no relationship between information seeking and computer latency”. We have to reject the
null hypothesis and accept the alternative.
Correlations
comp info
comp Pearson Correlation 1 .338**
Sig. (2-tailed) .000
N 380 380
info Pearson Correlation .338** 1
Sig. (2-tailed) .000
N 380 380
**. Correlation is significant at the 0.01 level (2-tailed).
Each variable is perfectly correlated with itself (obviously) and so r = 1 along the diagonal of the
table. The output shows that information seeking with Pearson Correlation value is r= .338
which shows there is positive relationship between information seeking and computer latency.
pg. 39
The strength of relationship is .338 which is statistically significantly different from 0 where
p<0.05.
Partial Correlation:
The space labelled Variables is for listing the variables that you want to correlate and the space
labelled Controlling for is for declaring any variables the effects of which you want to control.
In the example I have described, we want to look at the unique effect of information on decision
and so we want to correlate the variables decision and information while controlling all other
variables. If you click on options then another dialog box appears as shown in Figure. If you
haven’t conducted bivariate correlations before the partial correlation then this is a useful way to
compare the correlations that haven’t been controlled against those that have. This comparison
gives you some insight into the contribution of different variables.
pg. 40
Output:
Correlations
Control Variables info dec
Gender & Age & Education info Correlation 1.000 .786
& Frequency & Experience Significance (2-tailed) . .000
& comp & aty & play & joy & df 0 364
use & CaseNo
dec Correlation .786 1.000
Significance (2-tailed) .000 .
df 364 0
.786 shows that there is partial correlation between info and dec.
There are two types of groups independent and dependent. In independent group we take 2
groups and 2 different treatment will be done and if data is parametric we use independent t-test
or for non parametric we use Mann-Whitney test. If we take dependent group then in 1 group we
will do 2 treatments and if our data is parametric we perform dependent t-test or else we perform
pg. 41
Willcoxon Sign Rank test. If we have more than 2 groups i.e 3,4,5 etc and our data is parametric
then we perform One Way ANOVA or else in case of non parametric data we perform Fridman
ANOVA.
If p > 0.05 and t < 1.96 then Null hypothesis will be accepted. We run our assumptions on SPSS.
Open the composite file and perform the following steps:
pg. 42
On the basis of variance, standard deviation, Mean there is slightly difference between both
decision quality of male and female. Our sig value is .248 which is p >0.05. so it is insignificant
and our variance is equal. Our t- value is -.381 which is less than 1.96 the given criteria to accept
the null hypothesis (we don’t focus on the positive or negative sign, we see in absolute form). So,
null will be accepted and alternate will be rejected. There is similarity between decision quality
of male and female.
If levene’s test is insignificant it means there is homogeneity between the variance. Levene’s test
null hypothesis says that there exist homogeneity of variance between groups.
pg. 43
Analyze Non parametric test legacy dialogue 2-ind samples
test mann whitney test continue
Ranks
Gender N Mean Rank Sum of Ranks
dec Male 289 191.29 55284.00
Female 91 187.98 17106.00
Total 380
Test Statisticsa
dec
Mann-Whitney U 12920.000
Wilcoxon W 17106.000
Z -.251
Asymp. Sig. (2-tailed) .802
a. Grouping Variable: Gender
pg. 44
The output shows that there are 289 male and 91 females and the total cases are 380. Z and T
values represents the same thing. We have Z= -.251 and p= .802. so we have insignificant value
of P and Z value is less than 1.96.
G. Multivariate regression:
Regression: It is used to measure the relationship between the mean value of one
variable corresponding to values of other variables.
Regression analysis: it is a way to find the outcome variable from a predictor variable.
If more than one predictor variable used then it is called multiple regression analysis.
Regression coefficient: Any straight line can be defined by two things: (1) the slope
(or gradient) of the line (usually denoted by b1); and (2) the point at which the line
crosses the vertical axis of the graph (known as the intercept of the line, b0). These
parameters b1 and b0 are known as the regression coefficients. A line that has a gradient
with a positive value describes a positive relationship, whereas a line with a negative
gradient describes a negative relationship.
Residuals: we are interested in the vertical differences between the line and the actual
data because the line is our model: we use it to predict values of Y from values of the X
variable. In regression these differences are usually called residuals rather than
deviations, but they are the same thing.
Assessing the goodness of fit: sum of squares: R and R2: Once Nephwick the
Line Finder has found the line of best fit it is important that we assess how well this line
fits the actual data (we assess the goodness of fit of the model).
Conditions to imply regression analysis:
1. There should be relationship between IV and DV.
2. The relationship should not be spurious. Must be based on theory.
pg. 45
3. The cause should follow the effect in time.
• Regression Plots:
Checking Assumptions:
As a final stage in the analysis, you should check the assumptions of the model. We have already
looked for collinearity within the data and used Durbin–Watson to check whether the residuals in
the model are independent. A plot of *ZRESID against *ZPRED and for a histogram and normal
probability plot of the residuals are shown. The graph of *ZRESID and *ZPRED should look
like a random array of dots evenly dispersed around zero. If this graph funnels out, then the
chances are that there is heteroscedasticity in the data. If there is any sort of curve in this graph
then the chances are that the data have broken the assumption of linearity. Following is the
example of different outputs results.
pg. 46
• Multi collinearity tests:
To check the multi collinearity 2 tests are done:
1. VIF (Variance Inflation Factor): Its value should be < and equal to tolerable limit if high
then there is an issue.
2. Tolerance: Its value should be higher or equal to 0.1.
Dep variable (decision quality) and independent variables (all remaining variables)
statistics Plots Save (check unstandardized from residual column) and then
press OK.
pg. 47
pg. 48
Estimates: This option is selected by default because it gives us the estimated
coefficients of the regression model (i.e. the estimated b-values). Test statistics and their
significance are produced for each regression coefficient: a t-test is used to see whether
each b differs significantly from zero.
Model fit: This option is vital and so is selected by default. It provides not only a
statistical test of the model’s ability to predict the outcome variable, but also the value of
R (or multiple R), the corresponding R2 and the adjusted R2.
Collinearity diagnostics: This option is for obtaining collinearity statistics such as the
VIF, tolerance, eigenvalues of the scaled, not centered cross-products matrix, condition
indexes and variance proportions.
Durbin-Watson: This option produces the Durbin–Watson test statistic, which tests the
assumption of independent errors. Unfortunately, SPSS does not provide the significance
value of this test, so you must decide for yourself whether the value is different enough
from 2 to be cause for concern.
pg. 49
DEPENDNT (the outcome variable).
*ZPRED (the standardized predicted values of the dependent variable based on the
model). These values are standardized forms of the values predicted by the model.
*ZRESID (the standardized residuals, or errors). These values are the standardized
differences between the observed data and the values that the model predicts).
pg. 50
Output
Model Summaryb
a. Predictors: (Constant), use, Education, Gender, Experience, aty, Frequency, play, Age,
comp, joy, info
b. Dependent Variable: dec
The value of R2 here tells us about how much the independent variable explained variation in
dependent variable. In above table R2= .794 means 79.4% of variation in Dec Quality are
explained by Usefulness, Atypical use, joyful, playful, computer latency, information acquired
and other variations due to unknown variables which are not part of this model.
R2 = .5 or 50% is better.
ANOVAa
The ANOVA tells us whether the model, overall, results in a significantly good degree of
prediction of the outcome variable. However, the ANOVA doesn’t tell us about the individual
contribution of variables in the model.
pg. 51
F is 242.052 which is significant at p<0.05 (because the value in the column labelled Sig. is less
than .001). Therefore, we can conclude that our regression model results in significantly better
prediction of decision quality than if we used the mean value of decision quality. In short, the
regression model overall predicts decision quality significantly well.
It violates the assumption of homoscedasticity. Note that the points form the shape of a funnel so
they become more spread out across the graph. This funnel shape is typical of heteroscedasticity
and indicates increasing variance across the residuals.
pg. 52
The histogram should look like a normal distribution (a bell-shaped curve). SPSS draws a curve
on the histogram to show the shape of the distribution.
Tests of Normality
Kolmogorov-Smirnova Shapiro-Wilk
pg. 53
The straight line in this plot represents a normal distribution, and the points represent the
observed residuals. Therefore, in a perfectly normally distributed data set, all points will lie on
the line. In our output the residual points slightly deviate from the straight line.
pg. 54
Coefficients
Unstandardized Standardized
Coefficients Coefficients Collinearity Statistics
T represents the slope. All VIF values are less than 5 and all values for tolerance are less than
0.9.
INTERPRETATION
pg. 55
is not too important. The boxplot suggested a relatively normal distributional shape (with
no outlier) of the residuals.
Independence: A relatively random display of points in the scatterplots of standardized
residuals against values of the independent variables and standardized residuals against
predicted values provided evidence of independence. The Durbin-Watson statistic was
computed to evaluate independence of errors and was 2.058, which is considered
acceptable. This suggest that the assumption of independent errors has been met.
Multi Collinearity: Tolerance was greater than .10 (.511) and the variance inflation
factor was less than 5 (1.957), suggesting that multi collinearity was not an issue.
Assumption Results: The result of multiple linear regression suggest that a significant
proportion of the total variation in Decision quality was predicted by independent
variables, F (6,380) = 242.052, p < 0.000. Additionally, we find the following:
1. For Computer latency the unstandardized partial slope (-0.79) and standardized partial
slope (-0.85) are statistically significantly different from 0 (t = -2.968, df = 380, p , 0.05);
with every one point increase in computer latency, decision quality will decrease by
approximately (-0.79) of one point when controlling for information acquired.
2. For information acquired the unstandardized partial slope (1.121) and standardized partial
slope (0.802) are statistically significantly different from 0 (t = 25.055, df = 380, p <
0.05) for every one unit change in information acquired decision quality will also change
by approximately (1.121) units.
3. For Atypical Use the unstandardized partial slope (0.059) and standardized partial slope
(0.072) are statistically significantly different from 0 (t = 2.587, df = 380, p < 0.05) for
every one unit change in Atypical Use will also change one unit of decision quality by
approximately (0.059) units.
4. For playful the standardized partial slope (-0.036) and standardized partial slope (-0.046)
are statistically significantly different from 0 (t = -1.141, df = 380, p < 0.05) for every
one unit increase in playful will decrease the one unit of decision quality by
approximately (-0.036) units.
5. For joyful the standardized partial slope (0.006) and standardized partial slope (0.006) are
statistically significantly different from 0 (t = 0.191, df = 380, p < 0.05) for every one
pg. 56
unit increase in playful will increase the one unit of decision quality by approximately (-
0.006) units.
6. For usefulness the standardized partial slope (0.138) and standardized partial slope
(0.144) are statistically significantly different from 0 (t = 4.564, df = 380, p < 0.05) for
every one unit increase in playful will increase the one unit of decision quality by
approximately (0.138) units.
7. The intercept was insignificant (-0.58) not statistically significantly different from (t = -
0.515, df = 380, p < 0.05).
8. Multiple R2 indicated that approximately 79.4% of the variation in decision quality was
predicted by all the independent variables and remaining factors are not part of this model
according to Cohen (1988), this suggests a large effect.
pg. 57