You are on page 1of 6

Discrete Probability Distributions

A CEE3030 lecture prepared by


Gilberto E. Urroz
February 2006
Reference
The subjects presented are taken from the Maple
worksheed entitled
DiscreteProbabilityDistributions
available for download in the class schedule
Quick review of concepts for discrete
random variables - 1

Let X be a discrete random variable, then


f(x) = P(X=x) is the probability mass function (pmf)

F(x) = P(Xx) = = cumulative


distribution function (CDF)

Calculation of probabilities

P(X < x) = F(x-1) -- P(X x) = F(x)

P(X > x) = 1-F(x) -- P(X x) =1-F(x-1)


P(a < X < b) = F(b-1)-F(a)

P(a X< b) = F(b-1)-F(a-1)

P(a < X b) = F(b)-F(a)

P(a X b) = F(b)-F(a-1)

ux
f (u)
Quick review of concepts for discrete
random variables - 2

Let X be a discrete random variable, then


f(x) = P(X=x) is the probability mass function (pmf)

Calculation of measures
Mean,

Variance,
Skewness
Kurtosis
j=

i=1
n
x
i
f ( x
i
)
c
2
=

i =1
n
( x
i
j)
2
f ( x
i
)
o
3
=
1
c
3

i =1
n
( x
i
j)
3
f ( x
i
)
o
4
=
1
c
4

i =1
n
( x
i
j)
4
f ( x
i
)
Discrete distributions in Maple

Use the command: ?Statistics,Distributions for a


list of available distributions

Discrete distributions of interest are:

Bernoulli Bernoulli Bernoulli distribution Bernoulli distribution

Binomial Binomial binomial distribution binomial distribution

DiscreteUniform DiscreteUniform discrete uniform distribution

EmpiricalDistribution empirical distribution

Geometric Geometric geometric distribution geometric distribution

Hypergeometric Hypergeometric hypergeometric distribution

NegativeBinomial negative binomial (Pascal) dist.

Poisson Poisson Poisson distribution

ProbabilityTable probability table


Using Maple Statistics package to
define a discrete random variable

To load the Statistics package use: with(Statistics)

Use ?<distribution name> for help


e.g., ?Geometric

Define a random variable with distribution name


and appropriate parameters with function
RandomVariable
e.g., X := RandomVariable(Binomial(n,p))
e.g., X := RandomVariable(Poison(3.2))
Calculating measures of a
distribution - 1

After defining a random variable X in Maple, you


can calculate the following measures:

:= Mean(X)

2 := Variance(X)

:= StandardDeviation(X)

3 := Skewness(X)

4 := Kurtosis(X)
Calculating measures of a
distribution - 2

To obtain floating-point (decimal) results for the


measures of a distribution you may use:

:= evalf(Mean(X))

2 := evalf(Variance(X))

:= evalf(StandardDeviation(X))

3 := evalf(Skewness(X))

4 := evalf(Kurtosis(X))
Calculating probabilities - 1

To calculate probabilities use the following basic


functions:
ProbabilityFunction(X,a) for the pmf, i.e., f(a)=P(X=a)

CDF(X,a) for the CDF, i.e., F(a) = P(Xa)


Calculating probabilities - 2

To calculate more complex probabilities use


function CDF as follows:
P(X < x) = F(x-1) => use CDF(X,x-1)
P(X > x) = 1-F(x) => use 1-CDF(X,x-1)

P(X x) =1-F(x-1) => use 1-CDF(X,x-1)


P(a < X < b) = F(b-1)-F(a) =>
use CDF(X,b-1)-CDF(x,a)

P(a X< b) = F(b-1)-F(a-1) =>


use CDF(X,b-1)-CDF(x,a-1)

P(a < X b) = F(b)-F(a)=>


use CDF(X,b)-CDF(x,a)

P(a X b) = F(b)-F(a-1)=>
use CDF(X,b)-CDF(x,a-1)

The Bernoulli distribution

Random variable X can take only the values x = 0


and x = 1

Probability mass function:


with 0 < p < 1

Possible association of the values of x:


Binary logical No Yes
Voltage level Low voltage High voltage
Failure Success
Variable X X=0 equivalent X=1 equivalent
Sucess/failure
Measures of the Bernoulli distribution

c=. p(1p)
o
3
=
12p
. p(1p)
o
4
=
13p+3p
2
p(1p)
c
2
=p(1p)
j=p
The binomial distribution: X~B(n,p)

Consider n repetitions of a Bernoulli process with


parameter p

Let X = number of successes in n repetitions

Probability mass function

Binomial coefficient:
f ( x)=
(
n
x
)
p
x
(1p)
nx
, for x=0,1, ... , n
(
n
x
)
=
n!
x! (nx) !
Measures of the binomial distribution

c=.n p(1p)
o
3
=
12p
.np(1p)
o
4
=long expression , see worksheet
c
2
=n p(1p)
j=n p
Approximating the binomial distribution
with the normal distribution, X~N(,)

Applies for relatively large values of n and


relatively small values of p so that
np 5 or n(1-p) 5

Use X := RandomVariable(Normal(,)) to define


a normal random variable (continuous)

Main reason for the approximation: to avoid


calculating large factorial values No longer an
impediment with modern calculators and software
The Poisson distribution

Used to define discrete random variable X =


number of occurrences of a certain phenomena per
unit time, unit length, etc.

Probability mass function

Parameter represents the average number of


occurrence per unit time, length, etc.
f ( x)=
e
\
\
x
x !
, for x=0,1, ...
Measures of the Poisson distribution

c=.\
o
3
=
1
.\
o
4
=
3\+1
\
c
2
=\
j=\
Poisson distribution with scaling

Let X = number of occurrences of a phenomenon,


say, per unit time

Let = average number of occurrences per unit


time

Let T = period of interest for the analysis

Use = T as the parameter in the Poisson


distribution

See example of scaling in worksheet


Approximating the binomial distribution
with the Poisson distribution

Applies for np 5 or n(1-p) 5

Use X := RandomVariable(Poisson()) to define a


Poisson random variable (continuous)

Main reason for the approximation: to avoid


calculating large factorial values No longer an
impediment with modern calculators and software

Read details in worksheet


Approximating the Poisson distribution with
the normal distribution

Similar to the approximation of the binomial


distribution with the normal distribution

Main reason for the approximation: to avoid


calculating exponential functions in the Poisson
distribution No longer an impediment with
modern calculators and software

Read details in worksheet


The geometric distribution

Consider several repetitions of a Bernoulli process


with parameter p

Let X = number of repetitions required for the first


success

Probability mass function



f ( x)=(1p)
x1
p , for x=1, 2, ...
Measures of the geometric distribution

c=
.1p
p
o
3
=
2p
.
1p
o
4
=
p
2
9p+9
1p
c
2
=
1p
p
2
j=
1p
p
Period of return - 1
Let X
i
= maximum value of an event in period i,
independent random variables
Let q = P(X
i
<x) = probability of no-exceedence of
value x in period i
Let p = P(X
i
>x) = probability of exceedence of
value x in period i, thus q+p = 1, q = 1-p

Let T = number of periods past before the value of


x is exceeded
P(T=t) = P(X
1
<x)P(X
2
<x)...P(X
t-1
<x)P(X
t
>x)
= q
t-1
p = (1-p)
t-1
p = f(t), a pmf

T~geometric(p)
Period of return - 2

Expected value of the geometric distribution with


parameter p

Example, let X = magnitude of an annual flood,


with p = P(X>x) = 0.010 for x = 500 cfs, then
E(T) = 1/0.010 = 100 year

Thus, the period of return of a 500-cfs flood is 100


years, or the 100-year flood is 500 cfs
E(T )=
1
p
=
1
P( X >x)
The hypergeometric
distribution

Consider figure
Finite population size N
with a objects of a type
Draw a sample of size n
Let X = number of objects
of the type in sample
Probability mass function:
f ( x)=
(
a
x
)(
Na
nx
)
(
N
n
)

Mean

Variance
c
2
=
na( Na)( Nn)
N
2
( N1)
j=
na
N
The discrete uniform distribution

Let X = random variable taking the values x = a,


a+1, ..., b, each value with equal probability

The probability mass function is

Mean: = (a+b)/2

Variance:
2
= (a-b)(a-b-2)/12
f ( x)=
1
ba+1
, for x=a , a+1,... , b
Inverse cumulative distribution function

The CDF of a random variable X is defined as F(x)


= P(X x).

Given a probability p = F(x), the value of x is


defined as
x = F
-1
(p)

F
-1
is the inverse cumulative distribution function
(ICDF) of X

The probability density function (pdf) for this case


is given by
f(x) = e
x
, x 0

The corresponding cumulative distribution


function (CDF) is
F(x) = 1 - e
x

For p = F(x), the ICDF is given by


F
-1
(p) = -ln(1-p)/
Example - ICDF for the exponential
distribution (continuous variable case)
ICDF and Maple function Quantile

For a discrete random variable X the p quantile is


defined by
Q(p) = inf{x|F(x)p}
i.e., the closest inferior value of x such that F(x) is
larger or equal to p. This is calculated using
Maple's function Quantile(X,p)

If X takes only integer values, the ICDF for X is


calculated using Maple's function Quantile as
F
-1
(p) = Quantile(X,p) - 1
Fitting a distribution to a sample
X
s
= {x
1
,x
2
,...,x
ns
}, numerical sample of size ns.

Mean of the sample

Variance of the sample

Select a distribution, make = x


mean
and
2
= s
2
,
and solve for the parameters of the distribution
x
mean
=
1
ns

i=1
ns
x
i
s
2
=
1
ns1

i=1
ns
( xx
mean
)
2
Random numbers

Numbers generated by random processes, e.g.,


numbers out of a roulette, or lottery

Computers use deterministic algorithms that


produce pseudo-random numbers

Use Maple function Sample(X,ns), within package


Statistics, to produce a sample (vector) of size ns
for the random variable X, e.g., Xs:= Sample(X,ns)

To convert from a vector to a list, use:


convert(Xs,list)
Statistical simulation or Monte-
Carlo simulation

Generating synthetic data out of a given


distribution to use as input for a model

Example 1 - generating precipitation data for a


hydrological model

Example 2 generating hydraulic conductivity


data for an aquifer in groundwater simulation

Example 3 generating traffic data for a highway


operation simulation

You might also like