Introduction To SPSS Part1

Introduction to SPSS-Part 1
Vignes Gopal Krishna

Fast track PhD student, SLAI fellow, and
Research Assistant
University of Malaya
SPSS
Statistical Package/Product for Social
Sciences(Economics, Sociology,
Population Studies, and etc)- Subjects
People/Society
Statistical Package/Product for
Sciences(SPS) (Health Sciences,
Neurosciences, Medical Sciences,
Economics, Sociology and etc)-Subjects
People/Society/Patients/Animals/Neuron
SPSS- Rows X Columns X Cells (RCC)

Rows Subjects, Columns Variables, Cells
Values/Statements
SPSS = Main Inputs (DV-views) X Outputs (Results)
Additional inputs (Scripts & Syntax)
Advantages
Deals with the process of quantifying qualitative data
Numerical presentation of qualitative data (Descriptive and
Inferential Statistics)
Deals with both parametric and non-parametric approaches
Deals with Cross Sectional Data, Time Series Data, and
Panel Data
Rows
Menus
SPSS Layout
Cells
Icons
Columns
SPSS Multi-dimensional Matrix

Will you be able to find the number
of rows and columns?
Data View
Variable View
Disadvantages
Doesnt deal with advanced mode of modeling and
quantitative techniques (Not possible by menus)
Doesnt deal with the advanced techniques of data
type.(Not possible by menus)
Common measurement
(a)Categorical variable (CAV)-Nominal & Ordinal
(b)Continuous variable (COV)-Scale(Ratio & Interval)
(c) String Qualitative statements (Not important in
SPSS)-Nvivo, QDA-Miner, Dedoose, Atlas-TI, and etc
Classification variable = is a partial element of

categorical variable.
Classification variable-variable that is used to
classify qualitative arguments/statements
variable by categories (Categorical variable) +
variable by statements (Non-Categorical variable)
Categorical variable
(a)Dichotomous variable (Binomial) 2 values NO /
OR Independent & Dependent samples
(b)Polychotomous variables (Multinomial)- >2 values
NO/OR Independent & Dependent samples
Categorical variable
(a)constant and fixed
(b)Separated by categories
(c)Gradual change = 0, static
(d)Nominal (X order) and Ordinal
(Order)/Rank
Continuous variables
(a)X constant and fixed
(b)Separated by ratios and intervals
(c)Gradual change !=0, dynamic
Types of Variables
(a)Bi + nary variable = 2 groups of variables (0 and 1) Examples: Gender(0=Male, 1=Female), Case and
Control(0=Healthy, 1=Disease), Fluctuations(0=Increase, 1=Decrease.
(b)Dichotomous variable = 2 groups of variables(can be any 2 values) Examples:Gender(2=Male,3=Female),
Case and control(0=Before Treatment,1=Present Treatment)
(c)Independent variable = stand alone variable-Cor x1,x2,x3 = 0 Predictor/Regressor/Indicator
(d)Dependent variable = relying on factors Cor y,x1,x2 !=0)-Predictand/Regressand/Outcome
(e)Confounding variable = distorts the effects of one variable on another. -expansion of matching reduces
the effects of confounding.
(f) Control variable controls the effects of IV on DV.
(g)Controlled variable another term of Dependent Variable
(h)Instrumental variable variable that has zero correlation with residuals/error terms, but, has correlation with
dependent variable
(i) Criterion variable a variable that has presumed effect Non-experimental research
(j) Discrete variable a variable that takes up distinct values
(k)Dummy variable similar as binary variable classification variable
(l) Endogeneous variable inside the system-influenced by variables that are entering into the system.
(m)Exogeneous variable outside the system- entering the systm-influencing the endogeneous variable
(n)Interval variable a form of scale variable
(o)Ratio variable a form of scale variable
(p)Intervening variable intervene the association between the main variables. moderating and mediating
variables
(q)Mediating variable Indirect effect on the association between the main variables
(r)Moderating variable indirect effect through interaction effects between related variables
(s)Polychotomous variables take up more than 2

values/groups
(t)Manifest variable indicator variable that can indicate
the presence of latent variable
(u)Latent variable variable that cannot be measured
directly it has to depend on manifest variables.
(v)Manipulated variable Similar as IV
(w)Outcome variable Similar as DV-presumed effect
(x)Predictor variable Similar as IV-presumed cause
(y) Nominal variable takes up any value doesnt follow
orders/ranks
(z) Ordinal variable takes up values based on orders/ranks.
* Treatment variable Similar as IV
Types of Quantitative Data

(a)Time Series Data data follows the series of timing
single country/industry/activity/firm/organization/stock
market/society and etc multiple sampling periods
(b) Cross Sectional Data data follows the cross evaluations
of various forms of
subjects(countries/industries/activities/firms)-single point
of time
(c) Panel Data Time Series Data + Cross Sectional Data
with different characteristics
(d) Pooled Data Combined version of data with similar
characteristics
(e) Longitudinal Data Wider scope of data variation of
timing
Types of Qualitative Data

(a)Factual Data Demographical Data(Marital
Status, Level of Education, Age, Position and
etc)- (Experimental and Non-experimental
Data) Yes/No versus Yes/No/Dont know
Which one is more
True or False
preferable?
(b)Positive and Normative Data Actual versus
predicted, Agreement to Disagreement, Likes
to Dislikes
(c) Logical Arguments True or False
(d) Boolean Statements AND, OR, NOT
Likert Scale(LS) and

Scale(S)
LS != S
In a normal case, Scale

refers to ratio or
interval?
For example:5 Levels of Likert Scale

1=Strongly Agree
2=Agree
3=Neither Agree nor Disagree
4=Disagree
5=Strongly Disagree
Sample and Population

The association between Sample and
Population can be seen in the context of
Donut
RVRCNB
Approach
Which one is good?
Parameter and Statistics

Parameter = Population(Actual)
Statistics = Sample(Prediction)
Y=0 + 1X1 + 2X2 + (Parameter)
PY=P0 + P1X1 + P2X2 + P
(Statistics)
Statistics ~ Parameter (Actual
Population is Unknown)-estimated
Population
Descriptive and Inferential Statistics

*For quantitative mode of single/multi-purposes
*Descriptive = Describe + Narrative(Describing subjects) Single Purpose(SP)
*Inferential = Investigation + Narrative(Investigating subjects) Multi Purposes(MP)
Descriptive Analysis Quantitative research
(a)Descriptive Statistics (Continuous variables)-[Mean, Median, Variance, Standard
deviation, Max, Min , Range, skewness, kurtosis, Standard error of mean, Histogram
with normal curve, Normal Q-Q plot, Normal P-P plot Uni-variate
(b)Frequency Distribution(Categorical variables)-[Mode(similar as frequency), Median,
Variance and Standard Deviation, Max, Min, Range]-Uni-variate
Inferential Analysis Quantitative research
(c)Normality tests -hypothesis testing SPSS(Shapiro Wilk and Kolmogorov-Smirnov)
(d)Non-normality tests hypothesis testing SPSS(One Sample Kolmogorov Smirnov
tests for uniform, Poisson, and Exponential distributions)-Others are possible through
Scripts and Syntax
(e)Mean differences Single mean test, One sample t-test, Two samples (Independent
and Dependent sample tests)
(f) Association Linear and Non-Linear modes of regressions
(e)
Correlation Linear and Non-Linear modes of correlations
Types of Samplings
What
type of
research
?
All the research starts with a single or multiple

purposes..Purposive Sampling
Additional types of samplings
(a)Simple random sampling samples that have been selected
randomly-equal chance of probability unbiased sampling
(b)Systematic sampling samples that have been selected
from ordered sampling frame
(c)Stratified sampling sampling mode that are divided into
homogeneous subgroups
(d) Cluster sampling sampling that deals with the division of
it into groups that deals with the similar characteristics.
(e)Convenience sampling Easy sampling choose groups of
interest.
Sampling with replacement and no

replacement
*Are tied up with the probability of sample selection.
*For example:
Lets say that we have some alphabets with us(A,B, C,D,E)
(a)Sampling with replacement Select one alphabet first and put it back into the sample space. Two alphabets
were chosen. The sample space can be presented as below:AA, AB, AC, AD, AE
BA, BB, BC, BD, BE
CA, CB, CC, CD, CE
DA, DB,DC, DD, DE
EA, EB, EC, ED, EE
The probability of choosing at least one Alphabet A, [AA,AB,AC, AD,AE,BA,CA, DA, EA], Probability=9/25=0.36
(b)Sampling without replacement Select one alphabet first and do not put it again in the sample space. We
cannot select the same alphabets.We can just use the previous example in which two alphabets were
chosen. The sample space can be reflected as below:AA, AB, AC, AD, AE
BA, BB, BC, BD, BE
CA, CB, CC, CD, CE
DA, DB,DC, DD, DE
EA, EB, EC, ED, EE
The probability of choosing at least one alphabet A, [AB, AC, AD, AE, BA, CA, DA, EA]. Probability=8/20 = 0.4
Dependent and Independent

Samples
Dependent Samples Same subjects at
different levels (Very Highly Correlated)
Independent Samples Different
subjects at same and different levels.
(Low and Moderate
Correlations)
Independent
Sampl
e1
Populatio
n1
Populatio
n1
and Dependent
samples
Sampl
e2
Sampl
e3
Sampl
e 4
Sample Size
Should be representative of population size(N)
In a general/normal case, n >= pN(p=0.5 and above)
Manual computations of sample size(n)
Margin of errors/Standard errors in percentage (when
population size is unknown)
ME z PP (1 PP) / n
n z 2 PP (1 PP) / ME 2
Computation of sample size with finite population
correction factor
n= n(N)/n + (N-1)
Useful Software to deal with the selection of sample

size
(a)G*Power (
http://www.gpower.hhu.de/)
(b)Power sample size(
http://biostat.mc.vanderbilt.edu/wi
ki/Main/PowerSampleSize
)
(c)Power Analysis & Sample Size
(
http://www.ncss.com/software/pass/)
Parametric versus Nonparametric
Introduction
The terms of parametric and non-parametric
were coined by Jacob Wolfowitz in the year of
1942.
Parametric (distribution is known)
Non-parametric (distribution is unknown)
In my point of view, I would say that it is just a
general thought of statistics and it should be
used as a benchmark or baseline on the
development of various statistical modes of
intellectual thoughts on the statistical tests.
Characteristics of parametric approach

(a)Data follows the probability distribution
(b) Tied up with probability mode of sampling type (Simple random
sampling, Stratified random sampling, systematic random sampling,
random cluster, stratified random cluster, Complex Multi-stage Random,
Random mode of purposive sampling)
(c)Deals with the statistical inferences on the distributions of parameters
(d) Always linked with linearity of data(variables and
errors/residuals(uncertainty))
(e) Patterns of data(variables and errors/residuals follows the style of
homogeneity)
(f) Follows strict forms of assumptions (robust = if the assumptions are
fulfilled)
I would classify this approach as the classical approach due to the fact
that it doesnt the evolutionary direction of momentum.
Assumptions of parametric
approach
(a)Linearity of parameters
(b)Homogeneity/Homogeneous mode of existing variables and
omitted variables(error terms/residuals)-symmetrical form of
distribution.
(c)Dependent variables /residuals should be normally
distributed.
(d) Randomness among the selected samples should be
maintained (only if it has got to do with random sampling)
(e)Expansionary use of non-categorical variables(continuous
variables) in the statistical tests.
(f) Minimization of outliers
(g) Mean, Mode, and Median of the variables are approximately
the same (for the case of normal distribution)-Bell Shaped
Normal Curve.
(h) Doesnt deal with the process of re-sampling(Bootstrapping)
Identification on the statistical

approach is an important step that
should be taken before moving to
existing forms of statistical tests.
Distributional tests are needed to
determine the nature of
data(variables and residuals)
In a simple context,
Parametric follows normal
distribution
Distribution tests of normality

Graphical approach
(a)Histogram with normal curve
(b)Box plot
(c)Normal Q-Q plot
(d)Normal P-P plot
(e)Leverage Plot
Numerical approach
Uni-variate tests
(a)Jarque Bera test
(b)Coefficient of variations
(c)Coefficient of Skewness and Kurtosis
(d)Kolmogorov-Smirnov test
(e)Shapiro-Wilk test
(f) Shapiro-Francia test
(g)Anderson-Darling test
Multi-variate tests
(h)Multivariate tests of normality
Parametric tests of correlation

(a)Pearson product moment correlation coefficient (Bivariate analysis)
(b) Stepwise mode of linear regression (Multivariate analysis)
(c) Auxiliary mode of linear regression (Multivariate analysis)
(d) Scatter plot /Scatterplot matrix with fitness line(linear form)
(Bivariate analysis)
Non-parametric tests of correlation
(a)Spearman rank correlation (Bivariate analysis)
(b)Kendall Taus rank correlation (Bivariate analysis)
(c)Stepwise mode of Non-linear regression (Multivariate analysis)
(d)Auxiliary mode of Non-Linear regression (Multivariate analysis)
(e)Scatter plot/Scatterplot matrix with fitness line(Non-Linearity form)
(Bivariate analysis)
Parametric tests of associations

(a)Linear regression (Bivariate and Multivariate)
(b)Stepwise mode of Linear regression(Bivariate and Multivariate)
(c) Auxiliary mode of Linear regression(Bivariate and Multivariate)
(d)Linear mode of co-integration tests
(e)Linear mode of causality tests
Non-parametric tests of associations
(f) Non-Linear regression (Bi-variate and Multivariate)
(g)Logistic regression (LR) DV(categorical variable)
*Ordered LR (Ordinal variable)
* Un-ordered LR (Nominal variable)
(c) Correspondence Analysis
independent sample (Pearson Chi-Square, Contingency Coefficient
(Nominal),Phi-Cramers V(Nominal), Lambda (Nominal)
Main features of SPSS Inferential

Statistics
Linear Regression
Parametric
Regression
Non-Parametric
Linear Curve
Estimation
Linear Weight
Estimation &
Different types of
estimation
Probit Regression
Non-Linear
Regression
Non-Linear Curve
Estimation
Non-Linear Weight
Estimation &
Different types of
estimation
Tobit Regression
Linear mode of
Scatter plot
Simultaneous
regression
Linear
mode of
Leverage
Plot and residual
Logit Regression
Non-Parametric
Regression
Parametri
c
correlatio
n
Non-Linear mode
of Scatter Plot
Non-Linear mode of
Leverage plot and Residual
Non-Linear modeplot
of Simultaneous
equation
Pearson correlation
Linear Mode of
Stepwise Regression
Linear Mode of
Auxiliary regression
VIF & Tolerance
Value
Linear mode
of Scatter
Plot
Spearman rank
correlation
NonParametric
Correlation
Kendalls tau-b rank

correlation
VIF & Tolerance

Value
Non-Linear Step
Wise regression
Non-Linear Auxiliary
Regression
Non-Linear Mode
of Scatter Plot
Single test of mean

Parametric mode
of testing on
differences
Dependent
Samples
*Paired sample ttest
*ANOVA repeated
measures
One sample t-test

Two sample t-test
Independent
Samples
*Independent Sample ttest
*ANOVA one way/two
way/multiple factors
*MANOVA, GANOVA,
SPANOVA, ANCOVA,
MANCOVA,SPANCOVA
PM
Chi-Square test
Non-Parametric
mode of testing
on differences
Binomial test
2 sample tests
Dependent
samples
*Wilcoxon test
*Sign test
*McNemar test
Marginal
Homogeneity
*Friedman test
*Kendalls W test
*Cochrans Q test
Independent
samples
*Mann Whitney U test
*Moses extreme
reactions
*Kolmogorov-Smirnov Z
*Wald-Wolfowitz runs test
*Kruskal Wallis H test
*Median test
*Jonckheere-Terpstra test

Introduction To SPSS Part1

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Introduction To SPSS Part1

Uploaded by

Copyright:

Available Formats

Introduction to SPSS-Part 1

Vignes Gopal Krishna

SPSS- Rows X Columns X Cells (RCC)

SPSS Multi-dimensional Matrix

Classification variable = is a partial element of

(s)Polychotomous variables take up more than 2

Types of Quantitative Data

Types of Qualitative Data

Likert Scale(LS) and

In a normal case, Scale

For example:5 Levels of Likert Scale

Sample and Population

Which one is good?

Parameter and Statistics

Descriptive and Inferential Statistics

All the research starts with a single or multiple

Sampling with replacement and no

Dependent and Independent

Useful Software to deal with the selection of sample

Parametric versus Nonparametric

Characteristics of parametric approach

Identification on the statistical

Distribution tests of normality

Parametric tests of correlation

Parametric tests of associations

Main features of SPSS Inferential

Kendalls tau-b rank

VIF & Tolerance

Single test of mean

One sample t-test

You might also like