Professional Documents
Culture Documents
1.
In this lesson, we will discuss the continuous variables and their probability
distributions. We will start with the discussion of general continuous
distribution setup and will continue with two forms of the latter, namely, the
normal and sampling distributions. It is very important that you understand
the material discussed here as they constitute the theoretical background to
inferential statistics.
Discussion
A continuous random variable is a random variable that can take any value
contained in one or more intervals. (i.e., an uncountable number of values).
Examples: Salary, time, volume of milk in a container, etc. Since there is an
infinite number of values that can be assumed by a continuous random variable,
the probability of each individual value is virtually zero! As such, we can only
determine the probability of only a range of values.
Figure 1a: Histogram of the gas-mileages of 49 mid-sized cars.
Histogram of Mileage
12
10
8
Frequency
2.
K.H. Chen
6
4
2
0
30
31
32
Mileage
33
Density
0.4
0.3
0.2
0.1
0.0
30
31
32
33
Mileage
f(x)
12
10
8
6
4
2
0
29
30
31
32
33
34 x
2.
The total area under the curve between a and b is 1.0, i.e.,
f ( x ) dx = 1
b
f ( x ) dx for c a and d b
d
x. p ( x )
and V [ X ] = 2 =E [ X ] = ( x ) . p ( x )
2
all x
all x
The mean and variance of a continuous random variable, which ranges between a
and b are determined in a similar fashion using the integral sign rather than the
summation sign. That is,
Mean
E[X=
] =
a x. f ( x)dx
Variance
V [ X ] = 2 =E [ X ] =a ( x ) . f ( x)dx
b
a x
. f ( x )dx 2
2
E X 2 2 where E X 2 =
=
a x . f ( x )dx
b
Standard deviation
= V (X )
Example 1:
After playing golf for many years, a statistics professor determined the density
function for the distance his drives travel in hundreds of yards (denoted by X). It
is
3 2
=
f ( x)
x for 2 x 3
19
Confirm that the above function satisfies the requirements for a probability
density function.
Plot of f(x) vs x
1.5
1.4
1.3
1.2
f(x)
a.
1.1
1.0
0.9
0.8
0.7
0.6
2.0
2.2
2.4
2.6
2.8
3.0
From the above plot, we can see that f(x) > 0 for 2 x 3 and thus the first
requirement for a probability density function is met.
The total area under f ( x ) for 2 x 3 = 2
3 2
x dx
19
3 3 2
x dx
19 2
3
3 x3
=
19 3 2
3
x3
=
19 2
( 3 )3 ( 2 )3
=
19
27 8
= = 1
19
The second requirement for a probability density function is also met. Thus,
3 2
=
f ( x)
x for 2 x 3 satisfies the requirements for a probability density
19
function.
b.
Find the probability that the professors next drive is more than 250 yards.
3 3
2
P ( X > 2.5 ) =
2.5 19 x dx
3
x3
=
19 2.5
( 3)3 ( 2.5 )3
=
19
27 15.625
=
19
= 0.5987
c.
= 2
3 3
x dx
19
3
3x 4
=
76 2
=
243 48
= 2.5658 ( 256.58 yards )
76
E X 2 = 2 x 2 . f ( x )dx
3
= 2
3 4
x dx
19
3
3x5
=
95 2
729 96 633
=
95
95
Variance, =
V [ X ] E X 2 2
633
2
=
( 2.5658 )
95
= 0.07988227149 ( 798.823 yards 2 )
Standard deviation, = 0.07988227149
= 0.2826345193 ( 28.26 yards )
5
Uniform/Rectangular Distribution
A continuous random variable X is a uniform random variable over an interval
a x b or [a, b] (equivalently), if X can take on any value in closed interval
[a, b] and if the probability density function of X is constant over this interval.
That is,
f ( x) = b a
0
for a x b
otherwise
f(X )
1
ba
(b a )
a+b
E(X ) =
and V ( X ) =
12
2
X =
(b a )
12
Example 2:
The weekly output of a steel mill is a uniformly distributed random variable that
lies between 110 and 175 metric tons.
a.
b.
c.
d.
e.
Normal Distribution
A continuous random variable X, with the following probability density function is
called the normal random variable
1 x
1
f ( x)
e 2
=
2
X ~ N , 2
Z=
~ N ( 0,1) .
Note that the above standardized normal random variable, Z is called the standard
normal random variable and it has the following probability density function
=
f (z)
1 12 z 2
e
for < z < +
2
and its distribution is called the standard normal distribution. In short, we write
Z ~ N ( 0,1)
2
Here, E ( Z=
0 and V ( Z=
1.
) =
) =
Z
Z
f(Z)
0.4
0.3
0.2
0.1
0.0
0.9744
From the table, P ( Z < 1.95 ) =
10
0.974412
From the Session Window of MINITAB, P ( Z < 1.95 ) =
Note: When you need to calculate probabilities other than of the P( < Z < z ) or
P ( Z < z ) type, you need to be able to express your probability in terms of the
P ( Z < z ) probability. Homework problems will give you the chance to practice
doing so. Figure 4 depicts the way to do these manipulations.
Figure 4: Visualization of the simple arithmetic manipulations needed to express
other types of probabilities in terms of the P(Z < z) type.
Distribution Plot
f(Z)
Distribution Plot
Distribution Plot
f(Z)
0.4
0.4
0.3
0.3
f(Z)
0.4
0.977
0.2
0.136
0.1
0.0
0.3
0.2
0.1
0.1
0.0
0.0
Distribution Plot
Distribution Plot
f(Z)
0.841
0.2
0.4
0.3
0.3
Distribution Plot
f(Z)
0.4
f(Z)
1.000
0.4
0.977
0.2
0.2
0.1
0.1
0.0228
0.0
0.3
0.0
0.2
0.1
-3.09
11
3.09 Z
0.0
Distribution Plot
Distribution Plot
Distribution Plot
f(Z)
f(Z)
0.4
f(Z)
0.4
0.4
0.819
0.977
0.3
0.3
0.3
0.2
0.2
0.1
0.2
0.1
0.1
0.159
0.0
-1
0.0
0.0
-1
Example 4:
Let X be a normally distributed random variable with mean = 40 and = 5.
Find the probability P(X < 49).
To compute P(X < 49) using the standard normal table, we need to standardize X:
Figure 5: P(X < 49)
Distribution Plot
Distribution Plot
f(X )
f(Z)
0.09
0.4
0.08
0.07
0.964
0.964
0.3
0.06
Standardize
0.05
0.04
0.2
0.03
0.1
0.02
0.01
0.00
40
49
0.0
X 49 40
<
P ( X < 49 ) = P
= P ( Z < 1.8 ) = 0.9641
5
12
1.8
13
Example 6:
The life of a calculator manufactured by CASIO is normally distributed with =
50 months and = 8 months. What should the warranty period be if the
company does not want to replace more than 5% of its products?
Figure 6a: P(X < x0.95) = 0.05 where x0.95 denotes the 5th percentile of X ~ N(50, 64).
Distribution Plot
f(X )
0.05
0.04
0.03
0.02
0.01
0.05
0.00
x0.95
50
Figure 6b: P(Z < z0.95) = 0.05 where z0.95 denotes the 5th percentile of Z ~ N(0, 1).
Distribution Plot
f(Z)
0.4
0.3
0.2
0.1
0.05
0.0
z0.95
14
15
Example 7:
At a certain university, the SAT scores on the verbal portion of the first-year
students are normally distributed with mean 520 and standard deviation 40.
a.
b.
c.
Find the proportion of first-year students whose SAT scores on the verbal
portion are between 500 and 650.
How high a verbal test score must be in order to be among the highest 5%
test scores?
If 5 first-year students are randomly selected, what is the probability that
there will be 3 students whose scores are between 500 and 650?
16
A normal quantile (Q-Q) plot is a graph designed to show whether a normal model
is a reasonable description of the variation in the data. The basic idea behind the
normal quantile plot is to compare the data values with the values one would
expect from a standard normal distribution. The comparison is based on the idea
of quantiles.
Example 8:
0.0 0.3 0.1 0.5 0.4 2.8 2.6 1.3 0.5 2.6
To construct a normal quantile plot, do the following:
1.
2.
3.
Sort the data in ascending order (see Column III on the next page).
Determine which quantile each data value represents. In this example, the
smallest of the 10 values, represents the smallest 10% of the data. We will
consider this data value to lie half way between 0% and 10% (the middle of
i 0.5
the lowest 10%). In general, the computation
gives the desired value
n
i 1
of the position (expressed as a decimal) since that is halfway between
n
i
and
(see Column IV on the next page).
n
Compute the value theoretical quantile of the standard normal distribution:
z * or z( ni+0.5) ; that corresponds to the proportion computed in Column IV
n
17
A computer can get the value more accurately and indicates that it is
1.64485. MINITAB will give you this value if you type invcdf 0.05 next
to MTB > command in the session window or if you use the menu under
Calc > Probability Distributions > Normal)
A normal quantile plot is then constructed by plotting the values under Column III
( x(i ) ) against the values under Column V ( z * or z( ni+0.5) ).
n
If the data came perfectly from a standard normal distribution, Columns III and V
of the table below would be identical (the theoretical quantile and the data value
would match). This means that all the points would fall along the straight line y =
x. Since other normal distributions are just linear transformations of the standard
normal distribution ( x= + z ) , perfect data from a normal distribution with
mean and standard deviation would give a line with slope and intercept
.
We use normal quantile plots to assess the plausibility that a data set is a sample
from a normally distributed population. If the resulting plot is approximately
linear, then it is plausible that the data come from a normal distribution. Else (if
the plot is markedly nonlinear), it is doubtful that the data come from a normal
distribution. Of course, this will work much better for large data sets than for
small data sets.
I
Position
i
1
2
3
4
5
6
7
8
9
10
II
Data
Value
xi
0.0
0.3
0.1
0.5
0.4
2.8
2.6
1.3
0.5
2.6
III
Sample Quantile
(Sorted Data
Value)
x(i )
IV
Proportion below x(i ) :
1.3
0.5
0.4
0.3
0.1
0.0
0.5
2.6
2.6
2.8
0.05
0.15
0.25
0.35
0.45
0.55
0.65
0.75
0.85
0.95
i 0.5
n
18
V
Theoretical
Quantile
*
z or z( ni+0.5)
n
1.64485
1.03643
0.67449
0.38532
0.12566
0.12566
0.38532
0.67449
1.03643
1.64485
Theoretical Quantiles
-1
-2
-1
1
Sample Quantiles
19
Exponential Distribution
A continuous random variable X is exponentially distributed if its probability
density function is given by
=
f ( x)
for x 0
=
V [X ] X .
Variable
Mean = 0.5
Mean = 1
Mean = 2
2.0
1.5
1.0
0.5
0.0
0
10
15
20
30 X
25
x*
a.
P X >x =
e
b.
P X <x =
1 e
c.
*
1
*
2
x*
*
2
*
1
x1*
x2*
Note that if the number of arrivals follows a Poisson distribution, the times
between arrivals follow an exponential distribution.
20
Example 9:
Toll booths on the New York State Thruway are often congested because of the
large number of cars waiting to pay. A consultant working for the state concluded
that if service times are measured from the time a car stops in line until it leaves,
service times are exponentially distributed with a mean of 2.7 minutes.
a.
b.
c.
What is the probability that a car will take more than 2 minutes to get
through the toll booth?
What is the probability that a car will take less than 3 minutes to get through
the toll booth?
What is the probability that a car will take at least 2 but no more than 4
minutes to get through the toll booth?
21
X2
or
. That is,
n
n
E ( X=
( or X ) and Var ( X=
) =
) =
X
2
X
X2
or
n
n
Furthermore,
if X is a normal random variable (the population from which the samples are
drawn is normally distributed), then X is also a normal random variable
(probability distribution of the sample mean X is also normally distributed)
2
with mean E ( X=
and variance Var ( X=
) =
) =
X
X
2
;
n
if X is not a normal random variable (the population from which the samples
are drawn is not normally distributed), then X is approximately a normal
random variable (probability distribution of the sample mean X is
approximately normally distributed) provided n is large, according to the
Central Limit Theorem. In many practical situations, a sample size of 30 (
n 30 ) may be sufficiently large to allow us to use normal approximation
for the sampling distribution of X . However, if the population is extremely
nonnormal (for example, bimodal and highly-skewed distributions), the
sampling distribution will also be nonnormal even for moderately large
values of n.
22
23
Example 10:
a.
X ~ N(50, 64)
Histogram of Means (k = 1,000,000, n = 10)
50
50
12000
16000
14000
10000
12000
Frequency
Frequency
8000
6000
10000
8000
6000
4000
4000
2000
0
2000
35.2
39.6
44.0
48.4
52.8
Means (n = 5)
57.2
61.6
66.0
40.8
37.4
44.2
47.6
51.0
Means (n = 10)
61.2
57.8
54.4
50
50
12000
20000
10000
15000
Frequency
Frequency
8000
10000
6000
4000
5000
2000
0
40.5
43.2
45.9
48.6
51.3
Means (n = 15)
54.0
56.7
59.4
47.30
49.45
51.60
Means (n = 20)
53.75
55.90
58.05
55.1
57.0
50
50
14000
14000
12000
12000
10000
10000
Frequency
Frequency
45.15
8000
6000
8000
6000
4000
4000
2000
2000
43.00
42.90
44.85
46.80
48.75
50.70
Means (n = 25)
52.65
54.60
56.55
43.7
45.6
47.5
49.4
51.3
Means (n = 30)
=
=
=
=
=
=
=
=
=
=
=
=
5)
10)
15)
20)
25)
30)
N
1000000
1000000
1000000
1000000
1000000
1000000
Mean
49.997
49.998
50.000
50.000
50.001
50.003
StDev
3.578
2.525
2.067
1.791
1.600
1.459
5)
10)
15)
20)
25)
30)
Q3
52.409
51.701
51.393
51.208
51.081
50.987
Maximum
67.410
62.532
60.710
58.101
57.262
56.804
Mode
*
*
*
*
*
*
Variance
12.803
6.373
4.272
3.206
2.559
2.130
N for
Mode
0
0
0
0
0
0
24
Minimum
33.563
36.832
39.973
41.550
42.476
42.315
Q1
47.580
48.292
48.604
48.790
48.923
49.017
Median
49.996
49.998
49.998
50.001
50.002
50.003
53.2
b.
X ~ Unif(25, 75)
Histogram of Means (k = 1,000,000, n = 10)
16000
10000
14000
12000
8000
Frequency
Frequency
50
18000
12000
6000
10000
8000
6000
4000
4000
2000
2000
31.0
37.2
43.4
49.6
55.8
Means (n = 5)
62.0
68.2
31.2
62.4
67.6
50
10000
Frequency
Frequency
57.2
12000
8000
6000
8000
6000
4000
4000
2000
2000
0
32.2
36.8
41.4
46.0
50.6
55.2
Means (n = 15)
59.8
0
35.1
64.4
39.0
42.9
46.8
50.7
54.6
Means (n = 20)
58.5
62.4
50
50
16000
14000
14000
12000
12000
Frequency
10000
Frequency
46.8
52.0
Means (n = 10)
14000
10000
8000
6000
10000
4000
8000
6000
4000
2000
0
41.6
50
12000
36.4
2000
40.8
37.4
44.2
54.4
47.6
51.0
Means (n = 25)
57.8
61.2
38.4
41.6
44.8
48.0
51.2
54.4
Means (n = 30)
=
=
=
=
=
=
=
=
=
=
=
=
5)
10)
15)
20)
25)
30)
N
1000000
1000000
1000000
1000000
1000000
1000000
Mean
49.994
49.998
49.993
50.002
50.000
49.996
StDev
6.451
4.561
3.731
3.229
2.883
2.636
5)
10)
15)
20)
25)
30)
Q3
54.460
53.117
52.531
52.198
51.953
51.782
Maximum
73.408
69.436
67.824
64.841
63.044
62.576
Mode
*
*
*
*
*
*
Variance
41.616
20.800
13.921
10.427
8.309
6.949
N for
Mode
0
0
0
0
0
0
25
Minimum
26.619
30.057
32.819
35.332
37.017
38.557
Q1
45.536
46.872
47.457
47.812
48.047
48.212
Median
49.997
49.994
49.995
50.002
50.002
49.996
57.6
60.8
c.
X ~ exp(50)
Histogram of Means (k = 1,000,000, n = 10)
50
14000
20000
12000
10000
Frequency
Frequency
15000
10000
8000
6000
4000
5000
2000
0
29
58
87
116
145
Means (n = 5)
174
203
21
42
63
84
105
Means (n = 10)
126
147
50
20000
16000
14000
15000
10000
Frequency
Frequency
12000
8000
6000
10000
5000
4000
2000
0
18
36
54
72
90
Means (n = 15)
108
126
144
14.5
29.0
43.5
58.0
72.5
87.0
Means (n = 20)
101.5
116.0
50
50
12000
20000
10000
8000
Frequency
Frequency
15000
10000
6000
4000
5000
2000
0
14
28
42
56
70
84
Means (n = 25)
98
112
24.50
36.75
85.75
98.00
50
12000
10000
10000
8000
8000
Frequency
Frequency
61.25
73.50
Means (n = 30)
50
6000
4000
6000
4000
2000
2000
0
49.00
22.50
33.75
45.00
56.25
67.50
Means (n = 35)
78.75
90.00
101.25
26
28.8
38.4
48.0
57.6
67.2
Means (n = 40)
76.8
86.4
110.25
10000
12000
10000
8000
Frequency
Frequency
50
12000
8000
6000
6000
4000
4000
2000
2000
0
29.25
39.00
48.75
58.50
68.25
Means (n = 45)
78.00
87.75
97.50
27
36
45
54
63
Means (n = 50)
72
81
90
50
50
14000
12000
12000
10000
Frequency
Frequency
10000
8000
6000
4000
6000
4000
2000
0
8000
2000
27
36
45
54
63
Means (n = 55)
72
81
90
50
10000
10000
Frequency
Frequency
12000
8000
6000
4000
2000
46.8
54.6
62.4
Means (n = 65)
70.2
67.2
75.6
84.0
50
6000
2000
39.0
50.4
58.8
Means (n = 60)
8000
4000
31.2
42.0
14000
12000
33.6
25.2
78.0
85.8
29.6
37.0
44.4
51.8
59.2
Means (n = 70)
66.6
74.0
81.4
50
50
16000
14000
14000
12000
12000
Frequency
Frequency
10000
8000
6000
10000
8000
6000
4000
4000
2000
2000
29.6
37.0
44.4
51.8
59.2
Means (n = 75)
66.6
74.0
81.4
27
29.6
37.0
44.4
51.8
59.2
Means (n = 80)
66.6
74.0
81.4
14000
12000
12000
10000
10000
8000
6000
8000
6000
4000
4000
2000
2000
29.6
37.0
44.4
51.8
59.2
Means (n = 85)
66.6
74.0
81.4
34.0
40.8
47.6
54.4
61.2
Means (n = 90)
16000
14000
14000
12000
12000
Frequency
10000
8000
6000
10000
8000
6000
4000
4000
2000
2000
0
27.2
34.0
40.8
47.6
54.4
61.2
Means (n = 95)
68.0
74.8
33.0
39.6
46.2
52.8
59.4
Means (n = 100)
Variable
Means (n
Means (n
Means (n
Means (n
Means (n
Means (n
Means (n
Means (n
Means (n
Means (n
Means (n
Means (n
Means (n
Means (n
Means (n
Means (n
Means (n
Means (n
Means (n
Means (n
74.8
50
18000
16000
Variable
Means (n
Means (n
Means (n
Means (n
Means (n
Means (n
Means (n
Means (n
Means (n
Means (n
Means (n
Means (n
Means (n
Means (n
Means (n
Means (n
Means (n
Means (n
Means (n
Means (n
68.0
50
Frequency
50
16000
Frequency
Frequency
50
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
5)
10)
15)
20)
25)
30)
35)
40)
45)
50)
55)
60)
65)
70)
75)
80)
85)
90)
95)
100)
N
1000000
1000000
1000000
1000000
1000000
1000000
1000000
1000000
1000000
1000000
1000000
1000000
1000000
1000000
1000000
1000000
1000000
1000000
1000000
1000000
Mean
50.030
50.016
49.997
50.005
50.013
49.989
50.004
49.993
50.005
50.009
49.999
49.991
50.000
49.992
49.999
50.000
50.013
50.006
50.000
49.998
5)
10)
15)
20)
25)
30)
35)
40)
45)
50)
55)
60)
65)
70)
75)
80)
85)
90)
95)
100)
Q3
62.776
59.601
58.009
57.033
56.353
55.808
55.417
55.064
54.813
54.586
54.363
54.173
54.022
53.890
53.764
53.646
53.567
53.458
53.349
53.275
Maximum
222.738
164.371
143.829
124.532
120.890
111.209
105.860
94.210
96.548
91.011
93.361
87.456
84.908
83.259
81.664
82.805
84.893
81.011
79.137
77.490
StDev
22.350
15.834
12.914
11.194
10.000
9.130
8.450
7.902
7.460
7.071
6.736
6.457
6.199
5.980
5.775
5.583
5.427
5.268
5.121
5.003
Mode
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
Variance
499.538
250.717
166.765
125.304
100.002
83.355
71.402
62.447
55.647
49.998
45.376
41.688
38.433
35.762
33.351
31.171
29.454
27.749
26.225
25.031
N for
Mode
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
28
Minimum
2.441
5.632
8.239
13.661
15.563
18.710
20.072
21.147
21.845
22.887
24.549
23.093
24.911
27.437
25.150
27.228
28.619
28.847
27.900
27.543
Q1
33.723
38.622
40.773
42.074
42.950
43.573
44.075
44.459
44.800
45.074
45.302
45.502
45.686
45.835
45.989
46.132
46.251
46.355
46.462
46.537
Median
46.738
48.375
48.892
49.162
49.356
49.428
49.529
49.571
49.635
49.683
49.698
49.709
49.742
49.745
49.769
49.793
49.820
49.826
49.829
49.827
66.0
72.6
81.6
Example 11:
An automatic machine in a manufacturing process is operating properly if the
lengths of an important subcomponent are normally distributed with mean 117 and
standard deviation 5.2 (in centimeters).
a.
b.
c.
d.
Find the probability that one randomly selected subcomponent is longer than
120 cm.
Find the sampling distribution of the sample mean from a random sample of
size 4.
Find the probability that if four subcomponents are randomly selected, their
mean length exceeds 120 cm.
Find the probability that if four subcomponents are randomly selected, all
four have lengths that exceed 120 cm.
29
Example 12:
The restaurant in a large commercial building provides coffee for the buildings
occupants. The restaurateur has determined that the mean number of cups of
coffee consumed in a day by all the occupants is 2.0 with a standard deviation of
0.6. A new tenant of the building intends to have a total of 125 new employees.
What is the probability that the new employees will consume more than 240 cups
per day?
30