You are on page 1of 30

Learning the Language of

the Statistician

The following slides contain many of the symbols we


will be using in this class. These are the symbols we
will be using in formulas. While I do not require you
to memorize all of the formulas, it is important that
you know what these symbols mean. You will be
expected to memorize a few of the simpler formulas
for the departmental final.
To do responsible research, you must assimilate,
integrate and apply. This power point presentations
concentrates on assimilating this basic information.

Sample

Sampling
Population
Distribution
--------------------------------------------------------------------------------------------------------

Individual Score

yi

yi

Sample Size

Mean

Mu

Sigma

Standard Deviation

/n
estimated by
s/

Variance

S2

Sum

Proportion

Hypothesized Mean

Hypothesized Proportion p0

Pi

Stating Hypotheses with Symbols

One Sample Hypothesis Test for a Proportion


o Null hypothesis
P = The sample proportion is the same as the population proportion.
o Research hypothesis
P The sample proportion is NOT the same as the population
proportion. If you have a theory, you can use a one-tailed test and
indicate that it is greater or less than the population proportion.

One Sample Hypothesis Test for a Mean


o Null hypothesis
= The sample mean is the same as the population mean.
o Research hypothesis
The sample mean is not the same as the population mean. If you
have a theory, you can use a one-tailed test and indicate that it is greater
or less than the population mean.

Stating Hypotheses with Symbols

Chi Square
o Null hypothesis
H0 E=O, The expected value equal the observed value
The dependent variable is contingent on the independent variable
in the population
o Research hypothesis
H1 EO, The expected value does not equal the observed value
The dependent variable is NOT contingent on the independent
variable in the population

NOTE For an Elaborated Chi Square you simply state that E=0 for all of the
independent/dependent combinations for the null hypothesis. For the
research hypothesis you state that E 0 for at least one of the combinations.
You would actually test each dependent/independent combination
separately.

Stating Hypotheses with Symbols

One-Way Anova - with 2 groups


o Null hypothesis
H0 1 = 2, The Means are equal Or The Mean of Group 1 is the same as
the Mean of Group 2 in the population
o Research hypothesis
Two Tailed one the computer uses
H0 1 2, The Means are not equal OR the Mean of Group 1 is not the
same as the Mean of Group 2 in the population
One Tailed - state a direction
H0 1 < 2, or 1 > 2 The Mean of Group 1 lower than the Mean of
Group 2 in the population. The Mean of Group 1 is higher then the
mean of Group 2 in the population.

Stating Hypotheses with Symbols

One-Way Anova - with more than 2 groups*


o Null hypothesis
H0 1 = 2..k The Means of all the groups are equal.
o Research hypothesis
Two Tailed one the computer uses
H0 1 2,.. k The Means are not equal. The Mean of one group is
not equal to the Mean of at least one other group.
o * This is still bi-variate. You dont have more variables only more categories
in the categorical variable.

Stating Hypotheses with Symbols

Bi-Variate Regression
o Null hypothesis
H0 1 = 0, The regression slope is not different from 0 in the population
There is no relationship between the independent and dependent variables
in the population.
o Research hypothesis
H0 1 0, The Slope is different from 0 in the population
There is a relationship between the independent and dependent variable in
the population.

Multi-Variate Regression
o Null hypothesis
H0 1..k = 0, The regression slope is not different from 0 in the population
There is no relationship between the independent and dependent variable
in the population.
o Research hypothesis
H0 1k 0, At leas one of the Slopes is different from 0 in the population.
There is a relationship between the independent variable and at least one
of the dependent variables in the population.

Matching Variables with Types of Analysis


Chi-square (2 categorical variables)
type of car you drive by gender
race by political preference
race by eye color
gender by YES/NO questions
Anova (1 categorical and one continuous variable)
gender by yearly income
gender by score on self esteem index
race by yearly income
political preference
by yearly income
age by whether or not you have children
Bi Varate Regression (Two Continuous Variables)
yearly income by years of education
years married by marital satisfaction (scale score)
age by number of children
Multiple Regression ( continuous/dummy independent and continuous
dependent)
number of dates per year by yearly income, age,
height, gender (dummy variable).
poverty rates by sex ratio, percent single headed
household, percent employed.

Statistics That Do Not Use Hypotheses

Confidence Intervals
o We generally do not state a hypothesis for a Confidence Interval.
Confidence Intervals are used to estimate a population mean or
proportion based on a sample mean or proportion. Opinion polls
use Confidence Intervals to predict election results etc.

Pearson Correlation (correlation co-efficient or r)


o We generally do not associate Pearson Correlation Matrixes with
hypotheses. We generally use Pearson Correlation Matrixes for
diagnostic purposes and to test the strength of bi-variate relationships.

Equations/Formulas Z Tests
Z scores
o

Z=

o Where yi = individuals score


o = population mean
o = population standard deviation
o Information needed
Population mean and standard deviation
o Example of when we would use this
If you knew an individuals SAT/ACT score, you could
determine what percentile they scored in (i.e., the 95%)
OR if you know what percentile they are in, you can
determine their score.

Equations for Inferential Statistics

Summary Statistics
o

Mean
= /n

o Median

Order values and count up this far

o Variance
S2 = ( )2

o Standard Deviation
S = 2

Inferring a Population Mean or Proportion Based on


Sample Mean or Proportion
The following Slides Focus on How to Estimate a
Population Mean or Proportion if we ONLY have a
random sample.
In these cases we estimate one point in the
population (i.e., the mean IQ of USU students)
BUT we build a confidence interval around this
single point generally a 95% confidence interval

error

A One or Large Sample Hypothesis Test


In the following slides we compare a sample mean
or proportion with a population mean or proportion.
We want to know if our sample mean or proportion
is different from the population mean or proportion
The population mean or proportion could actually
be a mean/proportion that is specified by a theory
or by past research (rather than a number
computed from a population data set)

Equations/Formulas for One Sample Hypotheses Tests

The equations are outlined in red

What do the symbols mean


o
o
o
o
o

One sample hypothesis test for Proportion


P = proportion in the sample
0 =proportion or hypothesized proportion in the population
n = sample size
Z = computed statistic

o
o
o
o
o
o

One sample hypothesis test for Mean


= mean in the sample
0 = mean or hypothesized mean in the population
n = sample size
s = standard error or an estimate of the standard deviation in the population
s = computation for estimating the standard error using standard deviation
of the sample size times the square root of the sample size.

Symbols for Statistics that Infer the Relationship in the


Sample to the Population
Chi Square
Regression

Symbol(s)
X2

Interpretation
Chi Square Statistic

beta slope in population


slope in sample
alpha intercept or constant
in prediction formula
value of the X variables
y-hat or predicted Y
Y bar or the mean of Y

X1X

Anova

yi -

Mu or mean in population

Chi-Square Equation

Equations/Formulas for Inferential Statistics


o

Pearson Correlation Coefficient and R2


Formula
o r = ( ) (yi )
( )
o R2 = r squared

Multiple Regression

Prediction Equation
= + b1x1 + b2x2 + b3x3 +..
= predicted score for the dependent variable
a = intercept or constant
b = slope or parameter estimate for independent variables unit increase in Y
variable for ever 1 unit increase in X
X = value of the X values taken from the codebook

Equations/Formulas for Inferential Statistics

Anova
o Formula

o TSS = - G2

o SSB = ( ) 2
o

TSS = Total Sum of Squares


SSB = Sum of Squares Within
SSW = Sum of Squares Between

SSW = TSS SSB

s2B = F statistic
s2w

s2B = SSB/k-1
S2w = SSW/n-k

F = S2B/S2W
df between = k-1
df within = n-k

Anova and Regression


Sums of Squares
Anova
o TSS = Total Sum of Squares
o SSW = Sum or Squares within each group
o SSB = Sum of Squares between the groups
SSB/TSS = R square or the proportion of the total sum of squares that is
explained by group membership

Regression
o TSS Total Sum of Squares
o SSM Sum of Squares Model
o SSE Sum of Squares Error

Equations/Formulas for Inferential Statistics

Two Sample T-test


o

Formula
T = 1 2
__________

s1 2

this part is computed as follows


s1 2 = SP / + /

Pooled standard
deviation

Sp =
standard deviation
of sample 1

+
+

What symbols mean


t = critical value
1 = mean of sample one
2 = mean of sample two
n1 = size of sample 1 and n2 = size of sample 2
Degrees of freedom = df = n1 + n2 2

Uses a T distribution

Estimated standard error of the


difference between the two means

standard deviation
of sample 2

Equations/Formulas for Inferential Statistics

Mann Whitney
o Focuses on ranks rather than on means medians
o Two Groups
o Formula
Z= T1 E(T1)
(1)

E(T1) = n1 (n+1)
2

Rank values from smallest to largest


Sum ranks in smaller group = T1
Compute E(T1)
Compute Variance Var T1 = n1 n2 S2

s2 = (Yi - )2
n-1
Uses a Z dsitribution.

Equations/Formulas for Inferential Statistics

Kruskal Wallis
o Focuses on ranks (medians) rather than on means
o More than Two Groups
o Formula

-3 (n+1)

(+)

T = total sum of ranks for each sample


n = total number of cases
nk = number of cases for the k sample
Uses X2 Distribution
Degrees of Freedom = k-1 (where K is number of groups)
Use when you want to compare more than two groups, and the distribution is
not normal.

Equations/Formulas for Inferential Statistics


Formulas for Sample Size

Sample size (n) =

.9604

(+1)

D = degrees of freedom or margin of error (usually .05)


N= population size
.9604 = a constant related to at least 95% sure
This sample size is large enough that we can be at least
95% sure we can generalize to the population with a
margin of error of .05

Prepared by Dr. Carol Albrecht

You might also like