You are on page 1of 34

COMMON STATISTICAL

DISTRIBUTIONS
Summary by: Gernimo Maldonado-Martnez
Biostatistician
Data Management & Statistical Research Support Unit
Universidad Central del Caribe

Remember hypothesis
testing?

Only a small
probability (2.5%) of
getting a result
this small

Result could easily have arisen


if there was no real difference
between groups

0
z

Only a small
probability (2.5%) of
getting a result
this large

What happens if the distribution


of differences changes a little?

A much larger
probability of
getting a result
this small

A much larger
probability of
getting a result
0
Result could easily have arisen this high!
if there was no difference
between groups

What is a distribution?
The

complete summary of the frequencies


of the values or categories of a
measurement made on a group of subjects
The distribution shows either how many or
what proportion of the group was found to
have each value, or a range of values, out
of all possible values
The pattern of variation of a variable is
called its distribution, which can be
described both mathematically and
graphically.
Last J.M. A dictionary of epidemiology. Oxford University

Types of variable used here


Continuous

From 1 to
Ex: Weight, HgB count.
Discrete

Finite number
Ex: # of heads & tails in a coin flip.

Types of Distributions
Binomial
Poisson
Gamma
Normal

t-distribution

Exponential

F-distribution
Chi-squared

distribution
Hyper geometric
Laplace

Binomial Distribution
A

random sequence of n (fixed) Bernoulli trials

For

each individual trial

Only 2 possible outcomes (yes / no, heads / tails)


Outcome of each trial is independent
Probability of each outcome does not change over time
Probability

Mass Function (x = number of

successes)
the most frequently encountered in statistics
For a fixed number of trials and each trial results in a
success with probability p and a failure with probability
1-p.

n x
n x

p ( x) p (1 p )
x

Shape of Binomial Distribution


n=50 p=.15

n=10 p=.15
0.3
0.25
0.2
0.15

p(x)

p(x)

0.4
0.35

0.1
0.05
0
0

10

0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
0

10

p(x)

p(x)
2

25

30

35

40

45

50

30

35

40

45

50

n=50 p=.5

0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
1

20

n=10 p=.5

15

5
x

10

0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
0

10

15

20

25
x

Shapes depends greatly on size of n

Poisson Distribution
Important

and widely used


Used to model the number of random
occurrences of an event in a
continuous interval of time or space
Examples:

Patients arriving @ ER
Number of a given accident
Counts of live or dead organisms
Particle emissions from radioactive
source
Calls arriving at a switchboard

Poisson Distribution
Let

= the average number of


times that a repeated event
occurs per unit of time or
space under inspection
determines the shape of the
Poisson distribution
Example: Emergencies @ Centro
Mdico
= 1.97 per day or = 13.8 per week

Poisson Distribution

p( x) e
x!
x

Probability

Mass
Function (x =
number of events)

=13.8

0.3

0.3

0.25

0.25

0.2

0.2

0.15

0.15

p(x)

p(x)

=1.97

0.1

0.1

0.05

0.05

0
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30

Relationship between
Binomial and Poisson
Distribution

When

n is large and p is small, a


Poisson distribution can be used
to approximate a Binomial
distribution by letting = np
Example
Setting up a new burns unit for all
incidents involving children. To help
decide on resource allocation we
need to know the various expected
probabilities of number of patients
admitted to the unit per day.

Gamma Distribution
Very

complex and varied shapes


Provides a fairly flexible class for
modeling
Other known distributions (eg
Exponential) are special cases of the
Gamma distribution
Other important distributions that are
special cases of a gamma distribution and
used regularly include chi-squared
Density Function depends on 2

parameters

Gamma Distribution
Shape of various Gamma
Distributions

1.5

=4 =1

f(x)

=2 =1
=1 =1
=0.5 =1

0.5

0
0

3
x

Continuous Distributions
Statistical

distributions that may take


on a continuous range of values
Have a mathematical equation called a
Density Function, f(x) for an outcome
f(x) must satisfy
Sometimes called Continuous
Probability Function
a

P[ a x b] f ( x) dx
b

f ( x) 0 for all real x

f ( x) dx 1

What does this mean?


Density

functions are defined for


an infinite number of points over
a continuous interval
The area under the curve
between 2 distinct points defines
the probability that an outcome
falls in that interval
Probabilities are measured over
intervals and not single points

Discrete Distributions
A

statistical distribution that can only take


finite or countable number of values
Can define a mathematical equation
called a Probability Mass Function, p(x)
p(x) must satisfy:
the prob that x can that a specific value is p(x)

p( xi ) P[ X xi ]
p( xi ) 0 for all real x

p( x ) 1
i

Example of Density
Function f(x)
f(x)

-10

-8

-6

-4

-2

10

It is now only sensible to talk about the


probability of an observation falling in an
interval

Probability Mass Function


A

coin is tossed 3 times


All possible outcomes are HHH, HHT,
HTT, HTH, TTH, THT, THH and TTT
If x = number of heads after the 3
tosses then
P(x=0)
P(x=1)
P(x=2)
P(x=3)

=
=
=
=

1/8
3/8
3/8
1/8

Bernoulli Random Variable


Outcome

take on only 2 values with


probability p and 1-p
Example - Yes / No, Heads / Tails

Probability

Mass Function

p (1) p
p (0) 1 p
p ( x) 0, if x 0 or 1

Exponential Distribution
Can

be used to model waiting


times or lifetimes
Shape depends on a single
parameter >0
1/ = mean waiting time per unit of
time
Examples

Waiting time @ ER
Survival time of cancer patients
Working lifetime of machine

Exponential Distribution
Density

Function

e x x 0

f ( x)

x0

2
1.5

f(x)

=0.5
1

=1
=2

0.5
0
0

It has a mean of 1/ and a variance of 1/ 2

Normal Distribution
Plays

a central role as many statistical


tests assume an outcome has a normal
distribution
Shape has a single peak and symmetric
about
Spread is described by
Many Examples:
Persons height
IQ scores
Blood metabolites

Normal Distribution
Density

Function

1 ( x )2 / 2 2
f ( x)
e
2

=0 / =1

0.8
0.6

f(x)

=0 =1
0.4

=0 =2
=2 =0.5

0.2
0
-4

-2

0
x

Relationship between
Normal and other
distributions

The

normal distribution is often a


good approximation to a discrete
distribution when the discrete
distribution takes a symmetric
bell shape
Some distributions converge to
the normal distribution as their
parameters approach certain
limits
Binomial limits to Normal as n

Distributions used in
Analysis

Distributions

are used in
statistical tests to calculate
significance
Examples
Chi-Squared Distribution
t-Distribution
F-Distribution
Shape

based on degrees of
freedom

The t-statistic: (x1-x2)/sd(x1x2 )


A t-distribution
which takes into
account the error
in the estimate of
the sample variance

A normal distribution
(sd known)

A much larger
probability of
getting a result
this small

0
Result could easily have arisen
if there was no difference
between groups

A much larger
probability of
getting a result
this high!

Checking a Distribution
Graphs

can display the shape of


your distribution
Some graphs to use
Histogram
Q-Q Plot can check your data against
many theoretical distributions

Why is the Distribution


important?
Example
Want probabilities on Length of
Stay at ER for Asthma, ie P(LOS
3 days)
Have Length of Stay (LOS) data
for Asthma from July 1997 to June
1998
Mean = 2.02 days with SD = 1.66
days

Example (continued)
If

we assume the data is


Normally Distributed then we can
use Mean and SD results with the
Normal Density Function to
calculate LOS probabilities
Therefore we can estimate
various LOS probabilities
P(LOS 2 days) = 38%
P(LOS 4 days) = 28%

Example (continued)
How

does our estimated


probabilities compare with the
observed LOS data?

Observed

proportion with LOS


2 days was 59%
Our estimated probability was 38%

Observed

proportion with LOS


4 days was 19%
Our estimated probability was 28%

Example
Histogram of Observed Length of Stay (LOS)
data

Example
Q-Q

plots showed that a Gamma


Distribution with =1.48 =0.73
was a good approximation for our
LOS data
We can now calculate estimated
probabilities of LOS
P(LOS 2 days) = 60% (Observed =
59%)
P(LOS 4 days) = 22% (Observed =
19%)

Example
Graph

of Observed proportions of LOS and


estimated probabilities from a Normal and
Gamma distribution
LOS at WCH for Asthma 7/97 to 6/98
40

30

Observed

20

Normal

10

Gamma

0
0

LOS (days)

Not checking your distribution and assuming


a normal distribution can produce misleading
results!

You might also like